Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking.

Similar presentations


Presentation on theme: " Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking."— Presentation transcript:

1  Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking variables could be influence an apparent association. AP Statistics Objectives Ch9

2  Subset  Outlier  Leverage  Influential point  Lurking variable  Extrapolation Vocabulary

3 Leverage & Influential Points Vocabulary Chapter 9 Assignment Chp 8 Part I Answers Chp 8 Part II Answers

4 Chapter 9 Assignment pp.215-217 #10,12,14,16

5 Chapter 8 Part I Answers The explanatory variable (x) is initial drop, measured in feet, and the response variable (y) is duration, measured in seconds.

6 Chapter 8 Part I Answers The units of the slope are seconds per foot.

7 Chapter 8 Part I Answers The slope of the regression line predicting duration from initial drop should be positive. Coasters with higher initial drops probably provide longer rides.

8 Chapter 8 Part I Answers 12.4% of the variability in duration can be explained by variability in initial drop. (In other words, 12.4% of the variability in duration can be explained by the linear model.)

9 Chapter 8 Part I Answers a) +9 +9

10 Chapter 8 Part I Answers b) -50 -50

11 Chapter 8 Part I Answers c) +10 +10 15 4 18.75 0.8

12 Chapter 8 Part I Answers 4 0.8 d) -30 -30 -2 6 1.2

13 Chapter 8 Part I Answers The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear.

14 Chapter 8 Part I Answers

15 The scattered residuals plot indicates an appropriate linear model.

16 Chapter 8 Part I Answers

17 The duration of a coaster whose initial drop is one standard deviation below the mean drop would be predicted to be about 0.352 standard deviations (in other words, r standard deviations) below the mean duration.

18 Chapter 8 Part I Answers The duration of a coaster whose initial drop is three standard deviations above the mean drop would be predicted to be about 1.056 (or 3x0.352) standard deviations above the mean duration.

19 Chapter 8 Part I Answers According to the linear model, the duration of a coaster ride is expected to increase by about 0.242 seconds for each additional foot of initial drop.

20 Chapter 8 Part I Answers According to the linear model, a coaster with a 200 foot initial drop is expected to last 139.433 seconds.

21 Chapter 8 Part I Answers According to the linear model, a coaster with a 150 foot initial drop is expected to last 127.333 seconds. i. The advertised duration is shorter, at 120 seconds. 120 – 127.333 = -7.333 So, 7.333 seconds less than predicted. ii. This is a negative residual (over estimate)

22 Chapter 8 Part II Answers The linear model is appropriate. Although the relationship is not strong, it is reasonably straight, and the residuals plot shows no pattern.

23 Chapter 8 Part II Answers 33.3% of the variability in attendance can be explained by variability in the number of wins. (In other words, 33.3% of the variability in attendance can be explained by the model.)

24 Chapter 8 Part II Answers

25 A team that is two standard deviations above the mean in number of wins would be expected to have attendance that is 1.154 (or 2x0.577) standard deviations above the mean attendance.

26 Chapter 8 Part II Answers A team that is one standard deviation below the mean in attendance would be expected to have a number of wins that is 0.577 standard deviations (in other words, r standard deviations) below the mean number of wins. The correlation between two variables is the same, regardless of the directions in which predictions are made Be careful, though, since the same is NOT true for predictions made using the slope of the regression equation. Slopes are valid only for predictions in the direction for which they were intended.

27 Chapter 8 Part II Answers

28 The model predicts that a team with 50 wins will have attendance of 31, 653.72 people.

29 Chapter 8 Part II Answers For each additional win, the model predicts an increase in attendance of 517.609 people.

30 Chapter 8 Part II Answers A negative residual means that the teams actual attendance is lower than the attendance model predicts for a team with as many wins. The model had over estimated attendance for the specific number of wins.

31 Chapter 8 Part II Answers The predicted attendance for the Cardinals was 28,030.457. The actual attendance of 38,988 gives a residual of 38,988 – 28,030.457 = 10,957.543. The Cardinals had almost 11,000 more people attending than the model predicted.

32 Chapter 8 Part II Answers -3.9222534 -3.9222534

33 Chapter 8 Part II Answers The model predicts that a student with an SAT score of 0 would have a GPA of -1.262. The y-intercept is not meaningful in this context, since both scores are impossible.

34 Chapter 8 Part II Answers The model predicts that students who scored 100 points higher on the SAT tended to have a GPA that was 0.214 higher

35 Chapter 8 Part II Answers According to the model, a student with an SAT score of 2100 is expected to have a GPA of 3.23.

36 Chapter 8 Part II Answers According to the model, SAT score is not a very good predictor of college GPA. R 2 = (0.47) 2 = 0.2209, which means that only 22.09% of the variability in GPA can be predicted by the model. The rest of the variability is determined by other factors.

37 Chapter 8 Part II Answers A student would prefer to have a positive residual. A positive residual means that the student’s actual GPA is higher than the model predicts for someone with the same SAT score.

38 Chapter 8 Part II Answers -274.59712 -274.59712

39 Chapter 8 Part II Answers -274.59712 -274.59712

40 Leverage

41 Point with high leverage Regression Line Invisible Dead Weight

42 Unusual Point LEVERAGE? RESIDUAL? INFUENTIAL? HIGH LOW HIGH LOW

43 Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? HIGH (w/o) LOWer (w/) HIGH

44 Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? LOW HIGH LOW HIGH LOW

45 Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? LOW HIGH LOW HIGH LOW

46 Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? LOW HIGH LOW HIGH LOW

47 Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation.

48 Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation. Interpolation or Extrapolation? 1) Predicting the price of a property in Monopoly that would be 50 spaces away from “GO”. Extrapolation, data was only for 1 to 39 spaces from “GO”.

49 Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation. Interpolation or Extrapolation? 2) Predicting the price of a 500 mile flight. Inerpolation, data was for 327 to 2150 miles. 3) Predicting the price of a 200 mile flight. Extrapolation, data was only for 327 to 2150 miles.

50 Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation. Interpolation or Extrapolation? 2) Predicting the price of a 500 mile flight. Inerpolation, data was for 327 to 2150 miles. 3) Predicting the price of a 200 mile flight. Extrapolation, data was only for 327 to 2150 miles.

51 Chapter 9 Vocabulary 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation.

52 Chapter 9 Vocabulary 4) Influential point – a point that if it is removed from the data results in a very different regression model

53 Chapter 9 Vocabulary 4) Influential point – a point that if it is removed from the data results in a very different regression model


Download ppt " Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking."

Similar presentations


Ads by Google