Residuals and Residual Plots © Copyright 2015. All rights reserved. www.cpalms.org
Introduction Maybe you noticed that you missed a lot of free throws in basketball games. You decided to practice your free throw shooting to improve. Maybe you told a joke that hurt your friend’s feelings. You remembered to be more sensitive around him or her in the future. We all learn from our mistakes. In mathematics, too, you can learn a lot about data by looking at error. That’s what this lesson is all about! Suggested Questions: What kinds of mistakes do you think we have in math? © Copyright 2015. All rights reserved. www.cpalms.org
Key Terms residual residual plot Suggested Questions to students: “After reading this introduction, in your own words, What does Residual means to you?”, “Give me an example of residual?” “What could residual mean in a math class?” © Copyright 2015. All rights reserved. www.cpalms.org
Residuals The difference between the observed value of the dependent variable (y) and the predicted value (ŷ) from the Regression Line is called the residual (e). Each data point has one residual. Residual = Observed value - Predicted value residual = y – ŷ Both the sum and the mean of the residuals are equal to zero. Suggested Questions: What would be the observed data? And what would be the predicted data? (Clues: Think about the actual data and the regression line which one is observed and predicted). Why do you think the sum and the mean of the residual equal zero (0)? (Show next slide and check how the total of the residual and the mean equal zero) © Copyright 2015. All rights reserved. www.cpalms.org
The point of this slide is to point out that a residual measures how far the data point is from the regression line. Discuss that the positive residual has a value of 3, while the negative residual has a value of -2 © Copyright 2015. All rights reserved. www.cpalms.org
Residual Analysis in Regression Because the estimated linear regression line you calculated may not be the “best” linear regression line Because a linear regression model is not always appropriate for the data You should assess the appropriateness of the model by calculating residuals and examining residual plots. Suggested Q: What does it mean to be appropriate to the data? © Copyright 2015. All rights reserved. www.cpalms.org
Residual Plots A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot look randomly dispersed around the horizontal axis (No obvious mathematical patterns), a linear regression model is appropriate for the data; otherwise, a different linear equation or a non-linear model is more appropriate. More details will come on how to build a Residual Plot in slides 15 through 17. © Copyright 2015. All rights reserved. www.cpalms.org
Random Pattern Emphasize that since the residuals do not seem to have a mathematical pattern, the regression equation is the best choice for the original data. © Copyright 2015. All rights reserved. www.cpalms.org
Non-random: Since there seems to be a mathematical pattern to residual plot, the original data is not linear. © Copyright 2015. All rights reserved. www.cpalms.org
Non-random: Since there seems to be a mathematical pattern to residual plot, the original data is not linear. © Copyright 2015. All rights reserved. www.cpalms.org
Other Patterns All residuals are above the line All residuals are under the line Other © Copyright 2015. All rights reserved. www.cpalms.org
Essential Ideas A residual is the distance between an observed data value and its predicted value using the regression equation. Analyzing residuals is a method to determine if a linear model is appropriate for the data set. A residual plot is a scatter plot of the independent variable on the x-axis and the residuals on the y-axis. The shape of a residual plot can be useful to determine whether there may be a more appropriate model for a data set. © Copyright 2015. All rights reserved. www.cpalms.org
Example 1 The equation y = −2x + 20 models the data in the table at the left. Is the model a good fit? © Copyright 2015. All rights reserved. www.cpalms.org
Step 1- Calculate the residuals and organize your calculations in the table. Plug the x value into y = −2x + 20 to get the y value from model (predicted value). Find the difference between observed and predicted value. Residual = y – ŷ Each click will show a value on the table starting from the y-Value going down then on the Residual column given the residual amount. Students can try to calculate before teacher value. © Copyright 2015. All rights reserved. www.cpalms.org
Step 2: Use the points (x, residual) to make a scatter plot. The points are randomly dispersed about the horizontal axis. So, the equation y = − 2x + 20 is a good fit. © Copyright 2015. All rights reserved. www.cpalms.org
Example 2 The table at the left shows the ages x and salaries y (in thousands of dollars) of eight employees at a company. The equation y = 0.2x + 38 models the data. Is the model a good fit? © Copyright 2015. All rights reserved. www.cpalms.org
Step 1- Calculate the residuals and organize your to make a scatter plot results in a table. Plug the x value into y = 0.2x + 38 Residual = y – ŷ Each click will show a value on the table starting from the y-Value going down then on the Residual column given the residual amount. Students can try to calculate before teacher value. © Copyright 2015. All rights reserved. www.cpalms.org
Step 2: Use the points (x, residual) to make a scatter plot. The points form a ∩-shaped pattern. So, the equation y = 0.2x + 38 does not model the data well. © Copyright 2015. All rights reserved. www.cpalms.org
Recap: One way to determine how well a line of fit models a data set is to analyze residuals. A residual is the difference between the y-value of a data point and the corresponding y-value found using the line of fit. A residual can be positive, negative, or zero. A plot of the residuals shows how well a model fits a data set. If the model is a good fit, then the residual points will be randomly dispersed about the horizontal axis. If the model is not a good fit, then the residual points will form some type of pattern. © Copyright 2015. All rights reserved. www.cpalms.org