Three Measures of Influence

Slides:



Advertisements
Similar presentations
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 25 Multiple Regression Diagnostics (Sections )
Part I – MULTIVARIATE ANALYSIS C2 Multiple Linear Regression I
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24 Multiple Regression (Sections )
Lecture 24: Thurs., April 8th
Regression Diagnostics Checking Assumptions and Data.
Business Statistics - QBM117 Statistical inference for regression.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Correlation & Regression
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1 Stats 330: Lecture 23.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 6 Diagnostics for Leverage and Influence
Regression Diagnostics
Slides by JOHN LOUCKS St. Edward’s University.
Lecture 18 Outline: 1. Role of Variables in a Regression Equation
Diagnostics and Transformation for SLR
Lecture 14 Review of Lecture 13 What we’ll talk about today?
Stats Club Marnie Brennan
1. An example for using graphics
Residuals The residuals are estimate of the error
Motivational Examples Three Types of Unusual Observations
LESSON 4.4. MULTIPLE LINEAR REGRESSION. Residual Analysis
Solution 8 12/4/2018 F P1 P2 RESI1 SRES1 TRES1 HI1 FITS1
Multiple Linear Regression
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Chapter 4, Regression Diagnostics Detection of Model Violation
Interpretation of Regression Coefficients
Regression Diagnostics
Regression Forecasting and Model Building
Checking Assumptions Primary Assumptions Secondary Assumptions
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Essentials of Statistics for Business and Economics (8e)
Diagnostics and Transformation for SLR
Model Adequacy Checking
Presentation transcript:

Three Measures of Influence Lecture 16 Outline: Review of Lecture 15 Masking and Swamping Problems Three Measures of Influence 2/22/2019 ST3131, Lecture 16

Review of Lecture 15 3.Influential Points Leverage, Influence, and Outliers 1.High Leverage points /Outliers in the Predictor variables (in X-direction) Observations with larger are called High Leverage points. High Leverage points are also called outliers in the Predictor variables. 2.Outliers in the Response variable(in Y-direction) Observations with absolute standardized Residuals greater than 2 or 3 are usually called outliers 3.Influential Points A point is an Influential Point if its deletion, singly or in combination with others (2 or 3) , causes substantial changes in the fitted model ( estimation, fitted values, t-test, etc) 2/22/2019 ST3131, Lecture 16

Masking and Swamping Problems Standardized residuals provide useful information for validating linearity and normality assumptions and for identifying the outliers. However, these methods may fail to detect outliers and influential observations for the following reasons: The Presence of high leverage points The ordinary residuals, and the leverage values, have the following relationship: This implies that the high leverage points tend to have small residuals. Thus, the standardized residuals-based methods may fail to detect the outliers with high leverage data points. 2/22/2019 ST3131, Lecture 16

The masking and swamping problems Masking happens when we fail to detect some outliers that are hidden by other outliers. Swamping happens when we “detect” some non-outliers as outliers. 2/22/2019 ST3131, Lecture 16

The above plots fail to detect Observation 5 as an outlier since it is an outlier. It is masked. Thus, it is necessary to define other measures that can be used to detect such outliers should be defined. 2/22/2019 ST3131, Lecture 16

The influence of an observation is measured by the effects it produces Measures of Influence The influence of an observation is measured by the effects it produces on the fit when it is deleted in the fitting process. Let denote the regression coefficients obtained when the th observation is deleted. So are for fitted values and noise variance estimator. Influence measures look at the differences produced in the quantities such as Three measures will be defined in later slides. 2/22/2019 ST3131, Lecture 16

Cook’s Distance measures the influence of the i-th observation as which can be expressed as This is a multiplicative function of the squared standardized residuals and the potential function of the leverage values. The first term is large when the i-th observation is an outlier while the second quantity is large when the i-th observation is a high leverage point. It is suggested that observations with Ci greater F(p+1,n-p-1, .5) are classified as influential points. In practice, a dot plot or index plot of Ci is used to flag influential points. 2/22/2019 ST3131, Lecture 16

Welsch and Kuh Measure DFITS is defined as which can be written as When is replaced by , this measure is equal to . Points with |DFITS| greater than 2[(p+1)/(n-p-1)]^[1/2] are usually classified as Influential Points. In practice, a dot plot or index plot of DFITSi is used to flag influential points. Ci and DFITSi are approximately monotonically transformed from each other and hence they give similar answers for detecting influential points. 2/22/2019 ST3131, Lecture 16

Hadi’s Influence Measure As is seen, the Cook’s distance and the Welsch and Kuh Measure are multiplicative functions of standardized residuals and potential function. Hadi’s Influence Measure is a sum of potential function and scaled residuals defined as The first term is large for outliers in the X-direction/high leverage outliers while the second term is large for the outliers in the Y-direction. The index plot of is often used to detect influential points. 2/22/2019 ST3131, Lecture 16

The Potential-Residual Plot The index plot of a measure can be used to detect one kind of unusual observations, e.g. The potential-residual plot can be used to detect two different kinds of unusual observations: The P-R plot is obtained via plotting the potential function: against the scaled residual function: 2/22/2019 ST3131, Lecture 16

It is clear that some observations may be flagged as high leverage points, outliers or influential points. All these points should be carefully examined for accuracy (gross error, transcription error) , relevancy (whether it belongs to the data), and special significance (abnormal condition, unique situation). Points with high leverage that are not influential do not cause problems. Points with high leverage that are influential should be investigated. Examples with MLR The above examples are based on one response Y and one predictor variable (X4) for simplicity of presentation. Actually, the above results are valid for any number of predictor variables. For the New York Rivers Data, if all 4 predictor variables are included, we can draw the above index plots similarly and analyze the plots similarly. 2/22/2019 ST3131, Lecture 16

This is the matrix plot for the New York Rivers Data This is the matrix plot for the New York Rivers Data. See Page 6 for the data description, and Page 10 of the textbook for the data 2/22/2019 ST3131, Lecture 16

These are residual plots. 2/22/2019 ST3131, Lecture 16

These are the index plots of 2/22/2019 ST3131, Lecture 16

2/22/2019 ST3131, Lecture 16

2/22/2019 ST3131, Lecture 16