Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Describing Relationships Section 3.2

Similar presentations


Presentation on theme: "Chapter 3 Describing Relationships Section 3.2"— Presentation transcript:

1 Chapter 3 Describing Relationships Section 3.2
Least-Squares Regression

2 Least-Squares Regression
MAKE predictions using regression lines, keeping in mind the dangers of extrapolation. CALCULATE and interpret a residual. INTERPRET the slope and y intercept of a regression line. DETERMINE the equation of a least-squares regression line using technology or computer output. CONSTRUCT and INTERPRET residual plots to assess whether a regression model is appropriate.

3 Least-Squares Regression
INTERPRET the standard deviation of the residuals and r2 and use these values to assess how well a least-squares regression line models the relationship between two variables. DESCRIBE how the least-squares regression line, standard deviation of the residuals, and r2 are influenced by outliers. FIND the slope and y intercept of the least-squares regression line from the means and standard deviations of x and y and their correlation.

4 Regression Lines Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other.

5 Regression Lines Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other.

6 Regression Lines Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. Regression lines are expressed in the form 𝑦 = 𝑏 0 + 𝑏 1 𝑥 where 𝑦 (pronounced “y-hat”) is the predicted value of y for a given value of x.

7 Prediction A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles.

8 Prediction A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles.

9 Prediction 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛
A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛

10 Prediction 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 100000
A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−

11 Prediction 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 100000
A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑝𝑟𝑖𝑐𝑒 =$21,967

12 Extrapolation Can we predict the price of a Ford F-150 with 300,000 miles driven?

13 Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛
Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛

14 Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛
Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−

15 Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛
Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑝𝑟𝑖𝑐𝑒 =−$10,613

16 Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛
Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257− Extrapolation is the use of a regression line for prediction far outside the interval of x values used to obtain the line. Such predictions are often not accurate. 𝑝𝑟𝑖𝑐𝑒 =−$10,613

17 Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛
Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257− Extrapolation is the use of a regression line for prediction far outside the interval of x values used to obtain the line. Such predictions are often not accurate. 𝑝𝑟𝑖𝑐𝑒 =−$10,613 CAUTION: Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data.

18 Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable).

19 Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable).

20 Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable). A residual is the difference between the actual value of y and the value of y predicted by the regression line.

21 Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable). A residual is the difference between the actual value of y and the value of y predicted by the regression line. 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑎𝑐𝑡𝑢𝑎𝑙 𝑦 −𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑦 =𝑦 − 𝑦

22 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles.

23 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price.

24 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛

25 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−

26 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑝𝑟𝑖𝑐𝑒 =$26,759

27 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑝𝑟𝑖𝑐𝑒 =$26,759

28 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑝𝑟𝑖𝑐𝑒 =$26,759

29 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759

30 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=−$4765

31 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=−$4765 Interpret the residual.

32 Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=−$4765 Interpret the residual. The actual price of this truck is $4765 less than the cost predicted by the regression line with x = miles driven.

33 Interpreting a Regression Line
A regression line is a model for the data, much like the density curves of Chapter 2. The y intercept and slope of the regression line describe what this model tells us about the relationship between the response variable y and the explanatory variable x.

34 Interpreting a Regression Line
A regression line is a model for the data, much like the density curves of Chapter 2. The y intercept and slope of the regression line describe what this model tells us about the relationship between the response variable y and the explanatory variable x. In the regression equation 𝑦 = 𝑏 0 + 𝑏 1 𝑥 : 𝑏 0 is the y intercept, the predicted value of y when x = 0 𝑏 1 is the slope, the amount by which the predicted value of y changes when x increases by 1 unit

35 Interpreting a Regression Line
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why.

36 Interpreting a Regression Line
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope.

37 Interpreting a Regression Line
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope. The predicted price of a used Ford F-150 goes down by $ (16.29 cents) for each additional mile that the truck has been driven.

38 Interpreting a Regression Line
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope. The predicted price of a used Ford F-150 goes down by $ (16.29 cents) for each additional mile that the truck has been driven. Interpret the y intercept.

39 Interpreting a Regression Line
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope. The predicted price of a used Ford F-150 goes down by $ (16.29 cents) for each additional mile that the truck has been driven. Interpret the y intercept. The predicted price (in dollars) of a used Ford F-150 that has been driven 0 miles. (The y intercept does have meaning in this case, as it is possible to have a number of miles driven near 0 miles.)

40 Interpreting a Regression Line
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. important to include the word predicted (or equivalent) in your response. Otherwise, it might appear that you believe the regression equation provides actual values of y. When asked to interpret the slope or y intercept, it is very CAUTION: Interpret the slope. The predicted price of a used Ford F-150 goes down by $ (16.29 cents) for each additional mile that the truck has been driven. Interpret the y intercept. The predicted price (in dollars) of a used Ford F-150 that has been driven 0 miles. (The y intercept does have meaning in this case, as it is possible to have a number of miles driven near 0 miles.)

41 The Least-Squares Regression Line
There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals.

42 The Least-Squares Regression Line
There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals.

43 The Least-Squares Regression Line
There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals. The least-squares regression line is the line that makes the sum of the squared residuals as small as possible.

44 p. 184 and 187 Using your Calculator
We are going to practice using your calculator to make a scatter plot and residual plot.

45 Determining if a Linear Model Is Appropriate: Residual Plots
One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot.

46 Determining if a Linear Model Is Appropriate: Residual Plots
One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot. A residual plot is a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis.

47 Determining if a Linear Model Is Appropriate: Residual Plots
One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot. A residual plot is a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis.

48 Determining if a Linear Model Is Appropriate: Residual Plots
A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate: The residual plot should show no obvious patterns. The residuals should be relatively small in size.

49 Determining if a Linear Model Is Appropriate: Residual Plots
A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate: The residual plot should show no obvious patterns. The residuals should be relatively small in size.

50 Determining if a Linear Model Is Appropriate: Residual Plots
A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate: The residual plot should show no obvious patterns. The residuals should be relatively small in size. Pattern in residuals Linear model not appropriate

51 Determining if a Linear Model Is Appropriate: Residual Plots
How to Interpret a Residual Plot To determine whether the regression model is appropriate, look at the residual plot. If there is no leftover curved pattern in the residual plot, the regression model is appropriate. If there is a leftover curved pattern in the residual plot, consider using a regression model with a different form.

52 Residual Plots Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology produced the following residual plot. Is a linear model appropriate for these data? Explain.

53 Residual Plots Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology produced the following residual plot. Is a linear model appropriate for these data? Explain. Because there is no obvious pattern left over in the residual plot, the linear model is appropriate.

54 How Well the Line Fits the Data: The Role of s and r2 in Regression
Start here today.

55 How Well the Line Fits the Data: The Role of s and r2 in Regression
To assess how well the line fits all the data, we need to consider the residuals for each observation, not just one. Using these residuals, we can estimate the “typical” prediction error when using the least-squares regression line.

56 How Well the Line Fits the Data: The Role of s and r2 in Regression
To assess how well the line fits all the data, we need to consider the residuals for each observation, not just one. Using these residuals, we can estimate the “typical” prediction error when using the least-squares regression line. The standard deviation of the residuals s measures the size of a typical residual. That is, s measures the typical distance between the actual y values and the predicted y values.

57 How Well the Line Fits the Data: The Role of s and r2 in Regression
The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y.

58 How Well the Line Fits the Data: The Role of s and r2 in Regression
The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y. The coefficient of determination r2 measures the percent reduction in the sum of squared residuals when using the least-squares regression line to make predictions, rather than the mean value of y. In other words, r2 measures the percent of the variability in the response variable that is accounted for by the least-squares regression line.

59 How Well the Line Fits the Data: The Role of s and r2 in Regression
The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y. The coefficient of determination r2 measures the percent reduction in the sum of squared residuals when using the least-squares regression line to make predictions, rather than the mean value of y. In other words, r2 measures the percent of the variability in the response variable that is accounted for by the least-squares regression line. r2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset.

60 How Well the Line Fits the Data: The Role of s and r2 in Regression
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2.

61 How Well the Line Fits the Data: The Role of s and r2 in Regression
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s.

62 How Well the Line Fits the Data: The Role of s and r2 in Regression
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s. The actual price of a Ford F-150 is typically about $5740 away from the price predicted by the least-squares regression line with x = miles driven.

63 How Well the Line Fits the Data: The Role of s and r2 in Regression
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s. The actual price of a Ford F-150 is typically about $5740 away from the price predicted by the least-squares regression line with x = miles driven. Interpret r2.

64 How Well the Line Fits the Data: The Role of s and r2 in Regression
Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257− 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s. The actual price of a Ford F-150 is typically about $5740 away from the price predicted by the least-squares regression line with x = miles driven. Interpret r2. About 66% of the variability in the price of a Ford F-150 is accounted for by the least-squares regression line with x = miles driven.

65 Interpreting Computer Regression Output
A number of statistical software packages produce similar regression output. Be sure you can locate the slope b1 the y intercept b0 the values of s the value of r2

66 Interpreting Computer Regression Output
A number of statistical software packages produce similar regression output. Be sure you can locate the slope b1 the y intercept b0 the values of s the value of r2

67 Interpreting Computer Regression Output
A number of statistical software packages produce similar regression output. Be sure you can locate the slope b1 the y intercept b0 the values of s the value of r2

68 Calculating the Regression Equation from Summary Statistics
Using technology is often the most convenient way to find the equation of a least-squares regression line. It is also possible to calculate the equation of the least-squares regression line using only the means and standard deviations of the two variables and their correlation.

69 Calculating the Regression Equation from Summary Statistics
Using technology is often the most convenient way to find the equation of a least-squares regression line. It is also possible to calculate the equation of the least-squares regression line using only the means and standard deviations of the two variables and their correlation. How to Calculate the Least-squares Regression Line Using Summary Statistics We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means 𝑥 and 𝑦 and the standard deviations sx and sy of the two variables and their correlation r. The least-squares regression line is the line 𝑦 = 𝑏 0 + 𝑏 1 𝑥 with slope 𝑏 1 =𝑟∙ 𝑠 𝑦 𝑠 𝑥 and y intercept 𝑏 0 = 𝑦 − 𝑏 1 𝑥

70 Regression to the Mean

71 Regression to the Mean The scatterplot shows height versus foot length and the regression equation 𝑦 = 𝑥. We have added four more lines to the graph: a vertical line at the mean foot length x a vertical line at x + sx a horizontal line at the mean height y a horizontal line at y + sy

72 Regression to the Mean The scatterplot shows height versus foot length and the regression equation 𝑦 = 𝑥. We have added four more lines to the graph: a vertical line at the mean foot length x a vertical line at x + sx a horizontal line at the mean height y a horizontal line at y + sy For an increase of 1 standard deviation in the value of the explanatory variable x, the least-squares regression line predicts an increase of r standard deviations in the response variable y.

73 Regression to the Mean The scatterplot shows height versus foot length and the regression equation 𝑦 = 𝑥. We have added four more lines to the graph: a vertical line at the mean foot length x a vertical line at x + sx a horizontal line at the mean height y a horizontal line at y + sy For an increase of 1 standard deviation in the value of the explanatory variable x, the least-squares regression line predicts an increase of r standard deviations in the response variable y. This is called regression to the mean, because the values of y “regress” to their mean.

74 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations.

75 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS

76 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS

77 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS r = 0.816

78 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations.

79 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT

80 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT

81 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT

82 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations.

83 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. ASSOCIATION DOES NOT IMPLY CAUSATION

84 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. ASSOCIATION DOES NOT IMPLY CAUSATION When we study the relationship between two variables, we often hope to show that changes in the explanatory variable cause changes in the response variable.

85 Correlation and Regression Wisdom
Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. ASSOCIATION DOES NOT IMPLY CAUSATION When we study the relationship between two variables, we often hope to show that changes in the explanatory variable cause changes in the response variable. CAUTION: A strong association between two variables is not enough to draw conclusions about cause and effect.

86 Section Summary MAKE predictions using regression lines, keeping in mind the dangers of extrapolation. CALCULATE and interpret a residual. INTERPRET the slope and y intercept of a regression line. DETERMINE the equation of a least-squares regression line using technology or computer output. CONSTRUCT and INTERPRET residual plots to assess whether a regression model is appropriate.

87 Section Summary INTERPRET the standard deviation of the residuals and r2 and use these values to assess how well a least-squares regression line models the relationship between two variables. DESCRIBE how the least-squares regression line, standard deviation of the residuals, and r2 are influenced by outliers. FIND the slope and y intercept of the least-squares regression line from the means and standard deviations of x and y and their correlation.

88 Assignment 3.2 p #56-68 EOE (Every Other Even) and all (56, 60, 64, 68, 70, all and Chapter 3 FRAPPY!) If you are stuck on any of these, look at the odd before or after and the answer in the back of your book. If you are still not sure text a friend or me for help (before 8pm). Tomorrow we will check homework and review for 3.2 Quiz.


Download ppt "Chapter 3 Describing Relationships Section 3.2"

Similar presentations


Ads by Google