Residuals Revisited.   The linear model we are using assumes that the relationship between the two variables is a perfect straight line.  The residuals.

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted.
Advertisements

Chapter 8 Linear Regression
Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.
Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Chapter 8 Linear Regression.
Extrapolation: Reaching Beyond the Data
Regression Inferential Methods
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Statistics Residuals and Regression. Warm-up In northern cities roads are salted to keep ice from freezing on the roadways between 0 and -9.5 ° C. Suppose.
CHAPTER 8: LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 8: Linear Regression
AP Statistics Chapter 8: Linear Regression
Correlation and Regression Analysis
Chapter 8: Linear Regression
Correlation with a Non - Linear Emphasis Day 2.  Correlation measures the strength of the linear association between 2 quantitative variables.  Before.
Inferences for Regression
 The equation used to calculate Cab Fare is y = 0.75x where y is the cost and x is the number of miles traveled. 1. What is the slope in this equation?
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
Bivariate Data Analysis Bivariate Data analysis 4.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression (3)
Chapter 8 Linear Regression
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
And Review. Linear Regression Analysis We have been comparing the association between two quantitative variables: House Price and Size Burger Protein.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 7, Slide 1 Chapter 7 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Honors Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
AP Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line Line.
Bell Ringer CountryMarijuana (%) Other Drugs (%) Czech Rep.224 Denmark173 England4021 Finland51 Ireland3716 Italy198 Ireland2314 Norway63 Portugal73 Scotland5331.
Training Activity 4 (part 2)
CHAPTER 3 Describing Relationships
Finding the Best Fit Line
Unit 4 Lesson 4 (5.4) Summarizing Bivariate Data
Chapter 5 Lesson 5.3 Summarizing Bivariate Data
Inferences for Regression
Inference for Regression
Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 8 Linear Regression Copyright © 2010 Pearson Education, Inc.
Finding the Best Fit Line
Chapter 8 Part 2 Linear Regression
Chapter 8 Part 1 Linear Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Inferences for Regression
Presentation transcript:

Residuals Revisited

  The linear model we are using assumes that the relationship between the two variables is a perfect straight line.  The residuals are the part of the data that hasn’t been modeled.

 Residuals

  Residuals help tell us if the model makes sense.  When a regression is appropriate, it should model the underlying relationship.  Because nothing, in a sense, should be left behind, we usually plot the residuals in the hope of finding… nothing.

 Residuals  A scatterplot of the residuals versus the x-values should be the most boring scatterplot you’ve ever seen.  It shouldn’t have any interesting features like a direction or shape. It should stretch horizontally, with about the same amount of scatter throughout. It should have no bends, and it should have no outliers.

 Residuals  If the residuals show no interesting pattern when we plot them against x we can look at how big they are.  Remember, we are trying to make the residuals as small as possible.  Since their mean is always, though, it’s only sensible to look at how much they vary

 The Residual Standard Deviation  Equal Variance Assumption is this assumption that the residuals have the same scatter throughout.  The condition to check is the Does the Plot Thicken? Condition  We check to make sure the spread is about the same all along the line. We can check that either in the original scatterplot of y against x or in the scatterplot of residuals

 The Residual SD

  For the BK foods, the SD of the residuals is 9.2 grams of fat. That looks about right in the scatterplot of residuals. The residual for the BK Broiler chicken was -11 grams, just over one SD

 The Residual SD  It is a good idea to make a histogram of the residuals. If we see a unimodal, symmetric histogram then we can apply the rule to see how well the regression model describes the data.

  The variation of the residuals is key to assessing how well the model fits.  Let’s look at them using our favorite fast food example  The total Fat has a standard deviation of 16.4 grams.  The standard deviation of the residuals is 9.2 grams.

  If the correlation were 1.0 and the model predicted Fat values perfectly, the residuals would all be zero and have no variation.  On the other hand, if the correlation were zero the model would simply predict 23.5 grams of Fat (the mean) for all menu items.  The total Fat has a standard deviation of 16.4 grams.  The standard deviation of the residuals is 9.2 grams.

  If the correlation were 1.0 and the model predicted Fat values perfectly, the residuals would all be zero and have no variation.  On the other hand, if the correlation were zero the model would simply predict 23.5 grams of Fat (the mean) for all menu items.  The residuals from that prediction would just be the observed Fat values minus their mean.  These residuals would have the same variability as the original data because, as we know, just subtracting the mean doesn’t change the spread.

  How well does our BK regression model do?  We want to know how much of the variation is left in the residuals.  Think of it as finding a percent between 0% and 100% that is the fraction of the variation left in the residuals.

  An “R-squared” of 77.3% indicates that 77.3% of the variation in maximum wind speed can be accounted for by the hurricane’s central pressure. Other factors, such as temperature and whether the storm is over water or land, may explain some of the remaining variation.

 Just Checking… 90,000 square feet – the size of 50 average sized family homes

 Just Checking…  B) Is the correlation of Price and Size positive or negative? How do you know?  Answer: It’s positive. The correlation and the slope have the same sign. A final price tag of 75 million dollars

 Just Checking… 30 bedrooms and 20 bathrooms

 Just Checking…  You find that your house in Saratoga is worth $100,000 more than the regression model predicts. Should you be very surprised?  Answer: No, the standard deviation of the residuals is thousand dollars. We shouldn’t be surprised by any residual smaller than 2 standard deviations and a residual of $100,000 is less than 2*53, car garage, three swimming pools

The amenities don't finish there. Also constructed is an adult movie theater with a balcony, four fireplaces, a formal dining room that seats 30, all 23 full baths with full-sized Jacuzzis, 160 tripled paned windows and Brazilian mahogany French-style doors that alone cost $4 million. The banquet kitchen features two large commercial gas stoves, four commercial built-in refrigerators and a Japanese-style steakhouse island that seats 12.

 Regression Assumptions and Conditions  Quantitative Variable Condition  Linearity Assumption  Straight Enough Condition  Outlier Condition  For the standard deviation of the residuals to summarize the scatter, all the residuals should share the same spread.  Equal Variance Assumption  Does the Plot Thicken? Condition

 A Tale of Two Regressions  With this equation we estimated that a sandwich with 30 grams of protein would have 35.9 grams of fat.  Suppose, though, we knew the fat content and wanted to predict the amount of protein.

 Two Regressions

Pg 195, 16, 22, 23,25,37