Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted.
Advertisements

Chapter 8 Linear Regression
Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Residuals Revisited.   The linear model we are using assumes that the relationship between the two variables is a perfect straight line.  The residuals.
Chapter 8 Linear Regression.
Extrapolation: Reaching Beyond the Data
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Statistics Residuals and Regression. Warm-up In northern cities roads are salted to keep ice from freezing on the roadways between 0 and -9.5 ° C. Suppose.
CHAPTER 8: LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 5: Regression1 Chapter 5 Relationships: Regression.
Chapter 8: Linear Regression
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Chapter 8: Linear Regression
Introduction to Linear Regression and Correlation Analysis
Chapter 151 Describing Relationships: Regression, Prediction, and Causation.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
 The equation used to calculate Cab Fare is y = 0.75x where y is the cost and x is the number of miles traveled. 1. What is the slope in this equation?
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression (3)
Chapter 8 Linear Regression
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
9.2 Linear Regression Key Concepts: –Residuals –Least Squares Criterion –Regression Line –Using a Regression Equation to Make Predictions.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 7, Slide 1 Chapter 7 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Honors Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line.
Part II Exploring Relationships Between Variables.
AP Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line Line.
Training Activity 4 (part 2)
CHAPTER 3 Describing Relationships
Finding the Best Fit Line
Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 8 Linear Regression Copyright © 2010 Pearson Education, Inc.
Finding the Best Fit Line
Chapter 8 Part 2 Linear Regression
Looking at Scatterplots
Chapter 3: Describing Relationships
Chapter 8 Part 1 Linear Regression
Chapter 3: Describing Relationships
Chi-square Test for Goodness of Fit (GOF)
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Linear Regression

 The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless of the line we draw. Some points will be above the line and some will be below.

 The estimate made from a model is the predicted value (denoted as ).  The difference between the observed value and its associated predicted value is called the residual.  A negative residual means the predicted value’s too big (an overestimate).  A positive residual means the predicted value’s too small (an underestimate).

 Some residuals are positive, others are negative, and, on average, they cancel each other out.  So, we can’t assess how well the line fits by adding up all the residuals.  Similar to what we did with deviations, we square the residuals and add the squares.  The smaller the sum, the better the fit.  The line of best fit is the line for which the sum of the squared residuals is smallest.

 In Algebra a straight line can be written as:  In Statistics we use a slightly different notation:  We write to emphasize that the points that satisfy this equation are just our predicted values, not the actual data values.  The coefficient b 1 is the slope, which tells us how rapidly y changes with respect to x.  The coefficient b 0 is the intercept, which tells where the line hits (intercepts) the y -axis.

 In our model, we have a slope ( b 1 ): ◦ The slope is built from the correlation and the standard deviations: ◦ Our slope is always in units of y per unit of x.  In our model, we also have an intercept ( b 0 ). ◦ The intercept is built from the means and the slope: ◦ Our intercept is always in units of y.

 The regression line for the Burger King data fits the data well: ◦ The equation is ◦ The predicted fat content for a BK Broiler chicken sandwich is (30) = 35.9 grams of fat.

 The linear model assumes that the relationship between the two variables is a perfect straight line. The residuals are the part of the data that hasn’t been modeled. Residual = Data – Model Or, in symbols, Residuals help us to see whether the model makes sense.

 When a regression model is appropriate, nothing interesting should be left behind.  After we fit a regression model, we usually plot the residuals in the hope of finding…nothing.  The residuals for the BK menu regression look appropriately boring:

 The variation in the residuals is the key to assessing how well the model fits.  In the BK menu items example, total fat has a standard deviation of 16.4 grams. The standard deviation of the residuals is 9.2 grams.

 If the correlation were 1.0 and the model predicted the fat values perfectly, the residuals would all be zero and have no variation.  As it is, the correlation is 0.83—not perfection.  However, we did see that the model residuals had less variation than total fat alone.  We can determine how much of the variation is accounted for by the model and how much is left in the residuals.  The squared correlation, r 2, gives the fraction of the data’s variance accounted for by the model.

 For the BK model, r 2 = = 0.69, so 31% of the variability in total fat has been left in the residuals.  All regression analyses include this statistic, although by tradition, it is written R 2 (pronounced “R-squared”). An R 2 of 0 means that none of the variance in the data is in the model; all of it is still in the residuals.  When interpreting a regression model you need to Tell what R 2 means. ◦ In the BK example, according to our linear model, 69% of the variation in total fat is accounted for by variation in the protein content.

 R 2 is always between 0% and 100%. What makes a “good” R 2 value depends on the kind of data you are analyzing and on what you want to do with it.  Along with the slope and intercept for a regression, you should always report R 2 so that readers can judge for themselves how successful the regression is at fitting the data.

 Don’t fit a straight line to a nonlinear relationship.  Beware of extraordinary points ( y -values that stand off from the linear pattern or extreme x -values).  Don’t invert the regression. To swap the predictor-response roles of the variables, we must fit a new regression equation.  Don’t extrapolate beyond the data—the linear model may no longer hold outside of the range of the data.  Don’t infer that x causes y just because there is a good linear model for their relationship— association is not causation.  Don’t choose a model based on R 2 alone.

 Page 216 – 223  Problem # 1, 3, 5, 13, 15, 27, 39, 41 (omit part e and f), 45, 47.