Least-Squares Regression--- Prediction, Outliers, Influential Points and Extrapolation Section 3.3----Part IISection 3.3----Part II.

Slides:



Advertisements
Similar presentations
Residuals.
Advertisements

2nd Day: Bear Example Length (in) Weight (lb)
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Looking at Data-Relationships 2.1 –Scatter plots.
Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)
CHAPTER 3 Describing Relationships
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
Correlation & Regression
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Descriptive Methods in Regression and Correlation
Linear Regression.
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Chapter 3: Examining relationships between Data
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Residuals Target Goal: I can construct and interpret residual plots to assess if a linear model is appropriate. 3.2c Hw: pg 192: 48, 50, 54, 56, 58 -
Ch 3 – Examining Relationships YMS – 3.1
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Summarizing Bivariate Data
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Verbal SAT vs Math SAT V: mean=596.3 st.dev=99.5 M: mean=612.2 st.dev=96.1 r = Write the equation of the LSRL Interpret the slope of this line Interpret.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
WARM-UP Do the work on the slip of paper (handout)
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Lesson Correlation and Regression Wisdom. Knowledge Objectives Recall the three limitations on the use of correlation and regression. Explain what.
Warm Up Feel free to share data points for your activity. Determine if the direction and strength of the correlation is as agreed for this class, for the.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Lecture 5 Chapter 4. Relationships: Regression Student version.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Business Statistics for Managerial Decision Making
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Get out p. 193 HW and notes. LEAST-SQUARES REGRESSION 3.2 Interpreting Computer Regression Output.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Influential Points By Noelle Hodge. Does the age at which a child begins to talk predict later score on a test of mental ability? A study of the development.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Least Squares Regression Textbook section 3.2. Regression LIne A regression line describes how the response variable (y) changes as an explanatory variable.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
 Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Least-Squares Regression
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Presentation transcript:

Least-Squares Regression--- Prediction, Outliers, Influential Points and Extrapolation Section Part IISection Part II

What You’ll Learn: How to use the LSRL for predictionHow to use the LSRL for prediction Why we need to be cautious when predicting outside our original dataWhy we need to be cautious when predicting outside our original data How to spot an “outlier”How to spot an “outlier” The effect of an “influential point”The effect of an “influential point”

Using the LSRL for prediction Once we have determined we have a useful model, we tend to use the model for prediction. Let’s consider a new set of data: The following is the distance (in miles) as well as the airfare for twelve destinations from Baltimore Maryland. A scatterplot is included.

Traveling from Baltimore Based on this scatterplot, does it seem that knowing the distance to a destination would be useful for predicting the airfare?

Should we use a linear model? The scatterplot looks promising.The scatterplot looks promising. Let’s check out the correlation coefficient and a plot of the residuals.Let’s check out the correlation coefficient and a plot of the residuals. r =.795, r 2 =.632r =.795, r 2 =.632 The residual plot does not appear to have a pattern, so it looks like we can use our linear model.The residual plot does not appear to have a pattern, so it looks like we can use our linear model. Simple linear regression results: Dependent Variable: Airfare Independent Variable: Distance Airfare = Distance Sample size: 12 R (correlation coefficient) = R-sq = Estimate of error standard deviation:

Using the model for prediction LSRL for Distance vs AirfareLSRL for Distance vs Airfare Airfare = ( Distance)Airfare = ( Distance) When we use our regression equation for prediction, remember we are finding the “average” response value for a particular explanatory value.When we use our regression equation for prediction, remember we are finding the “average” response value for a particular explanatory value. This means that our predicted values will not always agree with actual observed values. We will under-predict for some and over-predict for others.This means that our predicted values will not always agree with actual observed values. We will under-predict for some and over-predict for others. To see this in action, let’s consider predicting for one of the distances we used to compute the LSRLTo see this in action, let’s consider predicting for one of the distances we used to compute the LSRL

Prediction Consider flying from Baltimore to Atlanta, 576 miles. This point is circled on both the fitted line plot and the residual plot.Consider flying from Baltimore to Atlanta, 576 miles. This point is circled on both the fitted line plot and the residual plot. The actual (observed) airfare for this flight is $178.00The actual (observed) airfare for this flight is $ Our line predictsOur line predicts Airfare = ( 576) = $ = $ The residual = obs – predThe residual = obs – pred = = = $27.12 = $27.12

Prediction Notice these three things:Notice these three things: The actual point is above the fitted lineThe actual point is above the fitted line The residual is above the “zero” lineThe residual is above the “zero” line The value of the residual is positiveThe value of the residual is positive All these indicate that our prediction line will under-predict for this particular airfareAll these indicate that our prediction line will under-predict for this particular airfare

Over and Under Predictions Over-predictions:Over-predictions: Point lies below the regression linePoint lies below the regression line Residual lies below the “zero” lineResidual lies below the “zero” line Value of the residual is negativeValue of the residual is negative Under-predictions:Under-predictions: Point lies above the regression linePoint lies above the regression line Residual lies above the regression lineResidual lies above the regression line Value of the residual is positiveValue of the residual is positive

Prediction Errors Why does this happen?Why does this happen? A relationship between two variables does NOT indicate that the explanatory variable causes changes in the response variable, it just gives us the relationship between them.A relationship between two variables does NOT indicate that the explanatory variable causes changes in the response variable, it just gives us the relationship between them. In this case, r 2 =.63, which means that about 63% of the variation we see in airfare can be explained by the variation we see in distance traveled.In this case, r 2 =.63, which means that about 63% of the variation we see in airfare can be explained by the variation we see in distance traveled. This means that about 37% of the variation in price has yet to be explained. In other words, we may want to explore other variables that may affect the cost of the ticket, such as type of airport, season, ect.This means that about 37% of the variation in price has yet to be explained. In other words, we may want to explore other variables that may affect the cost of the ticket, such as type of airport, season, ect.

So should we predict? As long as we recognize that our predictions are an average response value for a given explanatory variable, we will have some valuable information.As long as we recognize that our predictions are an average response value for a given explanatory variable, we will have some valuable information. Let’s use our model to predict for a destination that is 900 miles from BaltimoreLet’s use our model to predict for a destination that is 900 miles from Baltimore Airfare = ( 900) = $ = $ This means that if we were looking for a 900 mile flight, we would expect to pay about $188.90This means that if we were looking for a 900 mile flight, we would expect to pay about $ Which means that if we find a flight for $200.00, we might keep looking!Which means that if we find a flight for $200.00, we might keep looking!

Another Prediction What about the airfare from Baltimore to San Francisco, which is 2842 miles away.What about the airfare from Baltimore to San Francisco, which is 2842 miles away. Airfare = ( 2842)Airfare = ( 2842) = $ = $ Ok, so that’s reasonable, right?Ok, so that’s reasonable, right? Well, in 1998 when this data was gathered, a flight from Baltimore to San Francisco cost only $198.00!!!! So, although we expect some error, this is much more than we are willing to except!Well, in 1998 when this data was gathered, a flight from Baltimore to San Francisco cost only $198.00!!!! So, although we expect some error, this is much more than we are willing to except! Why are we so far off??????Why are we so far off??????

Predicting outside our data Consider the distances we used to create the model. They range from New York at 189 miles to Denver, 1502 miles.Consider the distances we used to create the model. They range from New York at 189 miles to Denver, 1502 miles. Our prediction for the flight to San Francisco assumes that the same relationship continues even though this distance is almost twice as far as Denver. We have no way of knowing if this relationship stays the same outside the domain of the original data. Predictions outside this domain is called “extrapolation”. This type of prediction is dangerous and should not be done.

Unusual Points (Outliers & Influential Points) Outliers are pieces of data that do not fit the overall pattern.Outliers are pieces of data that do not fit the overall pattern. If a point lies far away from the regression line in the y-direction, it will have a large residual (either positive or negative)If a point lies far away from the regression line in the y-direction, it will have a large residual (either positive or negative) Consider the following data which shows the relationship between the age (in months) at which a child first speaks and their subsequent score on a test for mental ability—Gesell scoreConsider the following data which shows the relationship between the age (in months) at which a child first speaks and their subsequent score on a test for mental ability—Gesell score

Unusual Points Notice the circled point.Notice the circled point. This point is far from the regression line. We also notice that it will have a very large residual. Care should be taken to ensure that we have recorded this point correctly.This point is far from the regression line. We also notice that it will have a very large residual. Care should be taken to ensure that we have recorded this point correctly.

Unusual Points in the y-direction Consider the regression analysis with this point included.Consider the regression analysis with this point included. Simple linear regression results: Dependent Variable: Score Independent Variable: Age Score = Age Sample size: 21 R (correlation coefficient) = R-sq = Estimate of error standard deviation: Now consider the analysis without this point Simple linear regression results: Dependent Variable: Score Independent Variable: Age Score = Age Sample size: 20 R (correlation coefficient) = R-sq = Estimate of error standard deviation: Notice that although the y-intercept and slope changed slightly the biggest change occurred in the value of the correlation coefficient, “r” and thus “r 2 ”. An outlier in the y-direction will weaken the strength of the linear relationship

Unusual Points in the X-direction Again consider the data set for age vs score, and notice that a second unusual point exists. However, this point is extreme in the x-direction.Again consider the data set for age vs score, and notice that a second unusual point exists. However, this point is extreme in the x-direction. Notice that this point is close to the regression line and will not have a large residual.

Unusual Points in the X-direction So how does a point like this affect the regression?So how does a point like this affect the regression? Consider the regression analysis with this point included. Consider the regression analysis with this point included. Simple linear regression results: Dependent Variable: Score Independent Variable: Age Score = Age Sample size: 21 R (correlation coefficient) = R-sq = Estimate of error standard deviation: Now consider the analysis without this point Simple linear regression results: Dependent Variable: Score Independent Variable: Age Score = Age Sample size: 20 R (correlation coefficient) = R-sq = Estimate of error standard deviation: Notice that the LSRL y-intercept and slope change a great deal when this point is removed and the regression is repeated. Let’s look at scatterplots of both scenarios.

How an unusual point in the X-direction affects the LSRL When we removed the point, our line changed a great deal. When a point has this type of affect on a LSRL, we call it an “influential point”

Additional Resources The Practice of Statistics—YMM Pg The Practice of Statistics—YMM Pg The Practice of Statistics—YMS Pg The Practice of Statistics—YMS Pg

What you learned How to use the LSRL for predictionHow to use the LSRL for predictionHow to use the LSRL for predictionHow to use the LSRL for prediction Why we need to be cautious when predicting outside our original dataWhy we need to be cautious when predicting outside our original dataWhy we need to be cautious when predicting outside our original dataWhy we need to be cautious when predicting outside our original data How to spot an “outlier”How to spot an “outlier”How to spot an “outlier”How to spot an “outlier” The effect of an “influential point”The effect of an “influential point”The effect of an “influential point”The effect of an “influential point”