 Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking.

Slides:



Advertisements
Similar presentations
Chapter 3 Bivariate Data
Advertisements

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
Relationships Between Quantitative Variables
CHAPTER 3 Describing Relationships
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
New Seats – Block 1. New Seats – Block 2 Warm-up with Scatterplot Notes 1) 2) 3) 4) 5)
Least-Squares Regression--- Prediction, Outliers, Influential Points and Extrapolation Section Part IISection Part II.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Lesson Correlation and Regression Wisdom. Knowledge Objectives Recall the three limitations on the use of correlation and regression. Explain what.
Answers to Part 2 Review Pg #1  % over 50 – r=.69  % under 20 – r=-.71  Full time Fac – r=.09  Gr. On time-r=-.51.
Linear Regression Day 1 – (pg )
Frap Time! Linear Regression. You will have a question to try to answer and I’ll give you 15 minutes. Then we will stop, look at some commentary then.
Chapter 8 Part I Answers The explanatory variable (x) is initial drop, measured in feet, and the response variable (y) is duration, measured in seconds.
Chapter 8 Linear Regression.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Georgetown Middle School Math
LSRL Least Squares Regression Line
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Chapter 3: Describing Relationships
Unit 3 – Linear regression
Section 3.3 Linear Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Chapter 3: Describing Relationships
Scatterplots and Correlation
Review of Chapter 3 Examining Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Lesson 2.2 Linear Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Honors Statistics Review Chapters 7 & 8
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

 Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking variables could be influence an apparent association. AP Statistics Objectives Ch9

 Subset  Outlier  Leverage  Influential point  Lurking variable  Extrapolation Vocabulary

Leverage & Influential Points Vocabulary Chapter 9 Assignment Chp 8 Part I Answers Chp 8 Part II Answers

Chapter 9 Assignment pp #10,12,14,16

Chapter 8 Part I Answers The explanatory variable (x) is initial drop, measured in feet, and the response variable (y) is duration, measured in seconds.

Chapter 8 Part I Answers The units of the slope are seconds per foot.

Chapter 8 Part I Answers The slope of the regression line predicting duration from initial drop should be positive. Coasters with higher initial drops probably provide longer rides.

Chapter 8 Part I Answers 12.4% of the variability in duration can be explained by variability in initial drop. (In other words, 12.4% of the variability in duration can be explained by the linear model.)

Chapter 8 Part I Answers a) +9 +9

Chapter 8 Part I Answers b)

Chapter 8 Part I Answers c)

Chapter 8 Part I Answers d)

Chapter 8 Part I Answers The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear.

Chapter 8 Part I Answers

The scattered residuals plot indicates an appropriate linear model.

Chapter 8 Part I Answers

The duration of a coaster whose initial drop is one standard deviation below the mean drop would be predicted to be about standard deviations (in other words, r standard deviations) below the mean duration.

Chapter 8 Part I Answers The duration of a coaster whose initial drop is three standard deviations above the mean drop would be predicted to be about (or 3x0.352) standard deviations above the mean duration.

Chapter 8 Part I Answers According to the linear model, the duration of a coaster ride is expected to increase by about seconds for each additional foot of initial drop.

Chapter 8 Part I Answers According to the linear model, a coaster with a 200 foot initial drop is expected to last seconds.

Chapter 8 Part I Answers According to the linear model, a coaster with a 150 foot initial drop is expected to last seconds. i. The advertised duration is shorter, at 120 seconds. 120 – = So, seconds less than predicted. ii. This is a negative residual (over estimate)

Chapter 8 Part II Answers The linear model is appropriate. Although the relationship is not strong, it is reasonably straight, and the residuals plot shows no pattern.

Chapter 8 Part II Answers 33.3% of the variability in attendance can be explained by variability in the number of wins. (In other words, 33.3% of the variability in attendance can be explained by the model.)

Chapter 8 Part II Answers

A team that is two standard deviations above the mean in number of wins would be expected to have attendance that is (or 2x0.577) standard deviations above the mean attendance.

Chapter 8 Part II Answers A team that is one standard deviation below the mean in attendance would be expected to have a number of wins that is standard deviations (in other words, r standard deviations) below the mean number of wins. The correlation between two variables is the same, regardless of the directions in which predictions are made Be careful, though, since the same is NOT true for predictions made using the slope of the regression equation. Slopes are valid only for predictions in the direction for which they were intended.

Chapter 8 Part II Answers

The model predicts that a team with 50 wins will have attendance of 31, people.

Chapter 8 Part II Answers For each additional win, the model predicts an increase in attendance of people.

Chapter 8 Part II Answers A negative residual means that the teams actual attendance is lower than the attendance model predicts for a team with as many wins. The model had over estimated attendance for the specific number of wins.

Chapter 8 Part II Answers The predicted attendance for the Cardinals was 28, The actual attendance of 38,988 gives a residual of 38,988 – 28, = 10, The Cardinals had almost 11,000 more people attending than the model predicted.

Chapter 8 Part II Answers

Chapter 8 Part II Answers The model predicts that a student with an SAT score of 0 would have a GPA of The y-intercept is not meaningful in this context, since both scores are impossible.

Chapter 8 Part II Answers The model predicts that students who scored 100 points higher on the SAT tended to have a GPA that was higher

Chapter 8 Part II Answers According to the model, a student with an SAT score of 2100 is expected to have a GPA of 3.23.

Chapter 8 Part II Answers According to the model, SAT score is not a very good predictor of college GPA. R 2 = (0.47) 2 = , which means that only 22.09% of the variability in GPA can be predicted by the model. The rest of the variability is determined by other factors.

Chapter 8 Part II Answers A student would prefer to have a positive residual. A positive residual means that the student’s actual GPA is higher than the model predicts for someone with the same SAT score.

Chapter 8 Part II Answers

Chapter 8 Part II Answers

Leverage

Point with high leverage Regression Line Invisible Dead Weight

Unusual Point LEVERAGE? RESIDUAL? INFUENTIAL? HIGH LOW HIGH LOW

Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? HIGH (w/o) LOWer (w/) HIGH

Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? LOW HIGH LOW HIGH LOW

Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? LOW HIGH LOW HIGH LOW

Leverage? LEVERAGE? RESIDUAL? INFUENTIAL? LOW HIGH LOW HIGH LOW

Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation.

Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation. Interpolation or Extrapolation? 1) Predicting the price of a property in Monopoly that would be 50 spaces away from “GO”. Extrapolation, data was only for 1 to 39 spaces from “GO”.

Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation. Interpolation or Extrapolation? 2) Predicting the price of a 500 mile flight. Inerpolation, data was for 327 to 2150 miles. 3) Predicting the price of a 200 mile flight. Extrapolation, data was only for 327 to 2150 miles.

Interpolation vs. Extrapolation 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation. Interpolation or Extrapolation? 2) Predicting the price of a 500 mile flight. Inerpolation, data was for 327 to 2150 miles. 3) Predicting the price of a 200 mile flight. Extrapolation, data was only for 327 to 2150 miles.

Chapter 9 Vocabulary 1) Extrapolation – Predicting a value for the response variable of a linear model using a explanatory value that is outside the range of data used to create the model. -This can be useful, but must be done with caution. 2) Interpolation – Predicting a value for the response variable of a linear model using a explanatory value that is inside the range of data used to create the model. -Safer than extrapolation.

Chapter 9 Vocabulary 4) Influential point – a point that if it is removed from the data results in a very different regression model

Chapter 9 Vocabulary 4) Influential point – a point that if it is removed from the data results in a very different regression model