Download presentation
Presentation is loading. Please wait.
Published bySheldon Nesbit Modified over 10 years ago
2
Influential Points and Outliers Debbi Amanti Debbi Amanti
3
OUTLIERS: Data points two or three standard deviations from the mean of the data. Observations that differ significantly from the pattern of the REST OF THE DATA Observations that lie outside the overall pattern of the other observations.
4
OUTLIERS IN TERMS OF REGRESSION: Observations with large (in absolute value) residuals. Observations falling f a r from the regression line while not following the pattern of the relationship apparent in the others Residual=actual-fitted
5
To mathematically compute an outlier given a univariate set of data: Find the Inter Quartile Range a.k.a. IQR (Q 3 -Q 1 ) and multiply this value by 1.5. An outlier for a data set is any point: Greater than Q 3 +1.5*(IQR) Less than Q 1 -1.5*(IQR)
6
INFLUENTIAL POINTS ARE: Points whose removal would greatly affect the association of two variables Points whose removal would significantly change the slope of an LSR line Points with a large moment (i.e they are far away from the rest of the data.) Usually outliers in the x direction.
7
The two graphs below show the same data – the one on the right with the removal of the green data point. As you can see, the removal of this point significantly affects the slope of the regression line. This is an influential point!
8
Using the same data as shown on the previous slide, lets compare the x and y data sets for the presence of outliers: X DATA IQR=5 IQR=5 Q 1 =3Q 3 =8 Q 1 =3Q 3 =8 MAX=15.5MIN=1 MAX=15.5MIN=1 An outlier is any point: > Q 3 +1.5*IQR=15.5 > Q 3 +1.5*IQR=15.5or < Q 1 -1.5*IQR=-4.5 < Q 1 -1.5*IQR=-4.5 THERE ARE NO OUTLIERS IN THIS DATA SET!!! THERE ARE NO OUTLIERS IN THIS DATA SET!!! Y DATA IQR=5 Q 1 =4Q 3 =9 MAX=10MIN=2 An outlier is any point: > Q 3 +1.5*IQR=16.5 or < Q 1 -1.5*IQR=-3.5 THERE ARE NO OUTLIERS IN THIS DATA SET!!!
9
!!!REMEMBER!!! An observation does NOT have to be an Outlier to be an Influential Point!! Nor does an observation need to be an Influential Point in order to be an Outlier!!
10
Get your calculator handy...
11
Given the five-number summary {8 21 35 43 77}, which of the following is correct? A. There are no outliers B. There are at least two outliers C. There is not enough data to make any conclusion D. There is exactly one outlier E. There is at least one outlier
12
The correct answer is E The five number summary gives you {Min Q 1 Median Q 3 Max} The IQR is calculated by Q 3 -Q 1 So, the IQR for the given data is 43-21=22 An outlier for this data would be: >Q 3 +1.5*IQR or Q 3 +1.5*IQR or <Q 1 -1.5*IQR >43+(22*1.5)=76 or 43+(22*1.5)=76 or <21-(22*1.5)=-12 Since the max is 77, there must be at least one outlier in this data set, but we cannot conclude how many outliers without more data.
13
Given the following scatterplot and residual plot. Which of the following is true about the yellow data point? I. It is an influential point II. It is an outlier with respect to the regression model II. It appears to be an outlier in the x direction A. I only B. I and II C. I and III D. None of the above E. All of the above
14
The correct answer is c I.Because this point has a large moment and is far from the rest of the data, it is an influential point. If this point was removed, the slope of the line would markedly change. II.This point is not an outlier with respect to the model because as you can see in the residual plot, it does not have a large residual (It follows the regression pattern of the data). III. By looking at both the scatterplot and the residual plot, you can see that the yellow point is an outlier in the x direction (far right of the rest of the data).
15
Resources used in this presentation include: Workshop Statistics by Allan Rossman The Basic Practice of Statistics by David S. Moore AMSCOs AP Statistics by James Bohan Any further questions, email me at: debora_amanti@bbns.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.