Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Advertisements

 Objective: To determine whether or not a curved relationship can be salvaged and re-expressed into a linear relationship. If so, complete the re-expression.
Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.
Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression.
Extrapolation: Reaching Beyond the Data
Chapter 10 Re-Expressing data: Get it Straight
Get it Straight!! Chapter 10
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Statistics Residuals and Regression. Warm-up In northern cities roads are salted to keep ice from freezing on the roadways between 0 and -9.5 ° C. Suppose.
CHAPTER 8: LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Chapter 10 Re-expressing the data
Re-expressing the Data: Get It Straight!
Regression Wisdom.  Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
 The equation used to calculate Cab Fare is y = 0.75x where y is the cost and x is the number of miles traveled. 1. What is the slope in this equation?
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Chapter 10 Re-expressing the data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 10 Re-expressing Data: Get It Straight!. Slide Straight to the Point We cannot use a linear model unless the relationship between the two.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression (3)
Copyright © 2010 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency.
Copyright © 2010 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 9 Regression Wisdom.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Regression Wisdom Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 7, Slide 1 Chapter 7 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Honors Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line.
Statistics 10 Re-Expressing Data Get it Straight.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 10 Re-expressing Data: Get it Straight!
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Training Activity 4 (part 2)
Chapter 10: Re-Expression of Curved Relationships
Finding the Best Fit Line
Unit 4 Lesson 4 (5.4) Summarizing Bivariate Data
Chapter 5 Lesson 5.3 Summarizing Bivariate Data
Re-expressing the Data: Get It Straight!
Chapter 8 Linear Regression.
Re-expressing Data: Get it Straight!
Chapter 8 Part 2 Linear Regression
Re-expressing the Data: Get It Straight!
Re-expressing the Data: Get It Straight!
Lecture 6 Re-expressing Data: It’s Easier Than You Think
Algebra Review The equation of a straight line y = mx + b
Re-expressing Data: Get it Straight!
Presentation transcript:

Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?

Fat Versus Protein: An Example  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  How many grams of fat would an item with 25 grams of protein have? Slid e 8- 2

What is Linear Regression  Remember that correlation suggests there is a “linear” relationship between two variables.  We can say more about the linear relationship between two quantitative variables with a model.  The linear relationship is modeled by a straight line through the data.  The data points do not all line up on the line, but a straight line summarizes the overall direction of the data.

Regression and Residuals  Some points will be above the line some points will be below the line.  The estimate made from a model is the predicted value (denoted as ŷ ).  The difference between a predicted value and the actual value is known as the residual

Residuals (cont.)  A negative residual means the predicted value’s too big (an overestimate).  A positive residual means the predicted value’s too small (an underestimate). Slid e 8- 5

Line of Best Fit  Some residuals are positive (above the predicted line) and some are negative (below the predicted line).  To find how well the line fits we add up the residuals. If we add the negatives and the positives, they cancel each other out. Therefore we add the squared residual values.  The line of best fit is the line where the sum of the squared residuals is the smallest.  The regression line is also know as the Least Squared Regression Line (LSRL)

Line of best fit  It is written as Ŷ = a + bx ŷ= b 0 +b 1 x

Slope of the regression line  Our slope is always in units of y per unit of x

Y intercept  Our intercept is always in units of y

Residuals Revisited  The model assumes all points are on the straight line.  The points of data that are not on the line are those that have not been modeled.  Data = Model + Residual  Residual = Data – Model  In symbols

Example  Given the regression line for the previous scatter plot  Ŷ = x  Predicted Fat = protein  What does the slope represent?  What does the y intercept mean?

Example continued  Given the regression line for the previous scatter plot  Ŷ = x  Predicted Fat = protein  How much fat would we expect an item with 12 grams of protein to have?  How much protein would an item with 15 grams of fat have?

Example continued  Given the regression line for the previous scatter plot  Ŷ = x  Predicted Fat = protein  A Double Whopper sandwich has 48 grams of Protein and 58 grams of fat. What is the residual in fat for this sandwich?

Example Burger King  The following are select items from the Burger King Menu with grams of fat and total calories ItemCaloriesGrams of fat Whopper65037 Whopper with cheese73044 Big King53031 Hamburger2309 Cheeseburger27012 Tendergrill chicken Sandwich46021 Original chicken Sandwich66040 Big fish Sandwich52028 BK Veggie Burger39016

Example Continued  What is the regression line for the data?  What is the slope in the context of the problem?  What is the y-intercept in the context of the problem?  A sandwich with 15 grams of fat would be expected to have how many calories?  A sandwich with 450 calories would be expected to have how many grams of fat?  A Bacon Cheeseburger has 13 grams of fat and 290 total calories, what is the residual in calories for this sandwich?

Conditions Required 1. Quantitative Variable condition 2. Straight enough condition 3. Outlier condition

R-Squared  R 2 – gives the fraction of the data’s variation accounted for by the model and 1 - R 2 is the fraction of the original variation left in the residuals.  Example: Burger King sandwich example r is r 2 is % of the calorie content in Burger King Sandwiches is explained by the fat content. 2.37% comes from other factors.

Residual Plot  A diagram of the residuals of the regression line.  A noticeable pattern in the residual plot may indicate that the regression line is not a good model.  The residual plot of a better fit model will have appropriate scatter

What not to do  Don’t fit a straight line to a non linear relationship  Beware of extraordinary points  Don’t extrapolate beyond the data  Don’t infer that x causes y just because there is a good linear model for their relationship  Don’t choose a model based on r 2 alone.

Breakfast Cereals, sugar and Calories The following is data from 77 different breakfast cereals comparing the relationship of sugar in the cereal and the amount of calories with each cereal. R = Calories mean – SD – 19.5 Sugar mean – 7.0 grams, SD – 4.4 What is the slope of regression line? What is the y – intercept? Write the regression equation? Interpret

Urban planning  We want to estimate the costs per person associated with traffic delays  2002 Urban mobility report (70 cities in 2000)  Annual cost person mean - $ SD - $  Average speed per person mean – mph, SD mph  R =  Write an equation to model this situation  What does the slope mean?

What to watch out for in Regression  Interpreting beyond the data – extrapolating  Influential points  Lurking variables  Linear regression that is not “linear” – what to do

Extrapolation  We cannot assume that a linear relationship in the data exists beyond the range of the data.  Once we venture into new x territory, such a prediction is called an extrapolation.

Slide Extrapolation (cont.)  A regression of mean age at first marriage for men vs. year fit to the first 4 decades of the 20 th century does not hold for later years:

Influential Outliers  We say that a point is influential if omitting the point from the scatterplot completely gives a different model.

Slide Outliers, Leverage, and Influence (cont.)  The following scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election…

Lurking Variable  No matter how straight the line, no matter how strong the association, or how high the R- squared value is, there is no way to conclude from regression alone that one variable causes the other.  There is always the possibility that some third variable is driving both of the variables being observed.

What to do when the linear regression line is not straight  Re-express the data with logs, square roots, reciprocals We will look at square roots and logarithms, primarily Example: taking the square root of the response variable and re-expressing the data in a scatterplot and examining the residual plot. Example: Re-expressing data using a combination of logarithms, log(x), log (y)  Fit a line to the curved graph – more difficult

Slide The Ladder of Powers Ratios of two quantities (e.g., mph) often benefit from a reciprocal. The reciprocal of the data An uncommon re-expression, but sometimes useful. Reciprocal square root -1/2 Measurements that cannot be negative often benefit from a log re-expression. We’ll use logarithms here “0” Counts often benefit from a square root re- expression. Square root of data values ½ Data with positive and negative values and no bounds are less likely to benefit from re- expression. Raw data 1 Try with unimodal distributions that are skewed to the left. Square of data values 2 CommentNamePower

Slide Plan B: Attack of the Logarithms (cont.)

Slide Why Not Just a Curve?  If there’s a curve in the scatterplot, why not just fit a curve to the data?

Slide Why Not Just a Curve? (cont.)  The mathematics and calculations for “curves of best fit” are considerably more difficult than “lines of best fit.”  Besides, straight lines are easy to understand.  We know how to think about the slope and the y-intercept.

Example: Data collected in the study of water pollution from commercial and domestic waste DayOxygen Demand