Download presentation
Presentation is loading. Please wait.
1
Three Measures of Influence
Lecture 16 Outline: Review of Lecture 15 Masking and Swamping Problems Three Measures of Influence 2/22/2019 ST3131, Lecture 16
2
Review of Lecture 15 3.Influential Points
Leverage, Influence, and Outliers 1.High Leverage points /Outliers in the Predictor variables (in X-direction) Observations with larger are called High Leverage points. High Leverage points are also called outliers in the Predictor variables. 2.Outliers in the Response variable(in Y-direction) Observations with absolute standardized Residuals greater than 2 or 3 are usually called outliers 3.Influential Points A point is an Influential Point if its deletion, singly or in combination with others (2 or 3) , causes substantial changes in the fitted model ( estimation, fitted values, t-test, etc) 2/22/2019 ST3131, Lecture 16
3
Masking and Swamping Problems
Standardized residuals provide useful information for validating linearity and normality assumptions and for identifying the outliers. However, these methods may fail to detect outliers and influential observations for the following reasons: The Presence of high leverage points The ordinary residuals, and the leverage values, have the following relationship: This implies that the high leverage points tend to have small residuals. Thus, the standardized residuals-based methods may fail to detect the outliers with high leverage data points. 2/22/2019 ST3131, Lecture 16
4
The masking and swamping problems
Masking happens when we fail to detect some outliers that are hidden by other outliers. Swamping happens when we “detect” some non-outliers as outliers. 2/22/2019 ST3131, Lecture 16
5
The above plots fail to detect Observation 5 as an outlier since it is an outlier. It is masked. Thus, it is necessary to define other measures that can be used to detect such outliers should be defined. 2/22/2019 ST3131, Lecture 16
6
The influence of an observation is measured by the effects it produces
Measures of Influence The influence of an observation is measured by the effects it produces on the fit when it is deleted in the fitting process. Let denote the regression coefficients obtained when the th observation is deleted. So are for fitted values and noise variance estimator. Influence measures look at the differences produced in the quantities such as Three measures will be defined in later slides. 2/22/2019 ST3131, Lecture 16
7
Cook’s Distance measures the influence of the i-th observation as
which can be expressed as This is a multiplicative function of the squared standardized residuals and the potential function of the leverage values. The first term is large when the i-th observation is an outlier while the second quantity is large when the i-th observation is a high leverage point. It is suggested that observations with Ci greater F(p+1,n-p-1, .5) are classified as influential points. In practice, a dot plot or index plot of Ci is used to flag influential points. 2/22/2019 ST3131, Lecture 16
8
Welsch and Kuh Measure DFITS is defined as
which can be written as When is replaced by , this measure is equal to Points with |DFITS| greater than 2[(p+1)/(n-p-1)]^[1/2] are usually classified as Influential Points. In practice, a dot plot or index plot of DFITSi is used to flag influential points. Ci and DFITSi are approximately monotonically transformed from each other and hence they give similar answers for detecting influential points. 2/22/2019 ST3131, Lecture 16
9
Hadi’s Influence Measure As is seen, the Cook’s distance and the
Welsch and Kuh Measure are multiplicative functions of standardized residuals and potential function. Hadi’s Influence Measure is a sum of potential function and scaled residuals defined as The first term is large for outliers in the X-direction/high leverage outliers while the second term is large for the outliers in the Y-direction. The index plot of is often used to detect influential points. 2/22/2019 ST3131, Lecture 16
10
The Potential-Residual Plot
The index plot of a measure can be used to detect one kind of unusual observations, e.g. The potential-residual plot can be used to detect two different kinds of unusual observations: The P-R plot is obtained via plotting the potential function: against the scaled residual function: 2/22/2019 ST3131, Lecture 16
11
It is clear that some observations may be flagged as high leverage points, outliers or influential points. All these points should be carefully examined for accuracy (gross error, transcription error) , relevancy (whether it belongs to the data), and special significance (abnormal condition, unique situation). Points with high leverage that are not influential do not cause problems. Points with high leverage that are influential should be investigated. Examples with MLR The above examples are based on one response Y and one predictor variable (X4) for simplicity of presentation. Actually, the above results are valid for any number of predictor variables. For the New York Rivers Data, if all 4 predictor variables are included, we can draw the above index plots similarly and analyze the plots similarly. 2/22/2019 ST3131, Lecture 16
12
This is the matrix plot for the New York Rivers Data
This is the matrix plot for the New York Rivers Data. See Page 6 for the data description, and Page 10 of the textbook for the data 2/22/2019 ST3131, Lecture 16
13
These are residual plots.
2/22/2019 ST3131, Lecture 16
14
These are the index plots of
2/22/2019 ST3131, Lecture 16
15
2/22/2019 ST3131, Lecture 16
16
2/22/2019 ST3131, Lecture 16
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.