Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.

Slides:



Advertisements
Similar presentations
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Advertisements

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
HSRP 734: Advanced Statistical Methods July 24, 2008.
Psychology 202b Advanced Psychological Statistics, II February 8, 2011.
Lecture 25 Multiple Regression Diagnostics (Sections )
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Lecture 24 Multiple Regression (Sections )
Chapter Topics Types of Regression Models
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
Linear and generalised linear models
Chapter 9 Multicollinearity
Linear Regression Analysis 5E Montgomery, Peck & Vining Hidden Extrapolation in Multiple Regression In prediction, exercise care about potentially.
Basics of regression analysis
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Simple linear regression and correlation analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Lecture 13 Diagnostics in MLR Variance Inflation Factors Added variable plots Identifying outliers BMTRY 701 Biostatistical Methods II.
1 Reg12M G Multiple Regression Week 12 (Monday) Quality Control and Critical Evaluation of Regression Results An example Identifying Residuals Leverage:
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
12/17/ lecture 111 STATS 330: Lecture /17/ lecture 112 Outliers and high-leverage points  An outlier is a point that has a larger.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Unit 9: Dealing with Messy Data I: Case Analysis
Chapter 6 Diagnostics for Leverage and Influence
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Multiple Linear Regression
Non-Linear Models Tractable non-linearity Intractable non-linearity
Regression Diagnostics
Validation of Regression Models
Hypothesis Tests: One Sample
Regression Model Building - Diagnostics
Diagnostics and Transformation for SLR
1. Describe the Form and Direction of the Scatterplot.
Chapter 3 Multiple Linear Regression
Motivational Examples Three Types of Unusual Observations
Multiple Linear Regression
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Simple Linear Regression
Three Measures of Influence
Regression Model Building - Diagnostics
Outliers and Influence Points
Linear Regression and Correlation
Diagnostics and Transformation for SLR
Model Adequacy Checking
Presentation transcript:

Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence

Linear Regression Analysis 5E Montgomery, Peck and Vining Importance of Detecting Influential Observations Leverage Point: –unusual x-value; –very little effect on regression coefficients.

Linear Regression Analysis 5E Montgomery, Peck and Vining Importance of Detecting Influential Observations Influence Point: unusual in y and x;

Linear Regression Analysis 5E Montgomery, Peck and Vining Leverage The hat matrix is: H = X(XX) - 1 X The diagonal elements of the hat matrix are given by h ii = x i (XX) -1 x i h ii – standardized measure of the distance of the ith observation from the center of the x- space.

Linear Regression Analysis 5E Montgomery, Peck and Vining Leverage The average size of the hat diagonal is p/n. Traditionally, any h ii > 2p/n indicates a leverage point. An observation with large h ii and a large residual is likely to be influential

Linear Regression Analysis 5E Montgomery, Peck and Vining 6

7 Example 6.1 The Delivery Time Data Examine Table 6.1; if some possibly influential points are removed here is what happens to the coefficient estimates and model statistics:

Linear Regression Analysis 5E Montgomery, Peck and Vining Measures of Influence The influence measures discussed here are those that measure the effect of deleting the ith observation. 1.Cook’s D i, which measures the effect on 2.DFBETAS j(i), which measures the effect on 3.DFFITS i, which measures the effect on 4.COVRATIO i, which measures the effect on the variance-covariance matrix of the parameter estimates.

Linear Regression Analysis 5E Montgomery, Peck and Vining Measures of Influence: Cook’s D What contributes to D i : 1.How well the model fits the ith observation, y i 2.How far that point is from the remaining dataset. Large values of D i indicate an influential point, usually if D i > 1.

Linear Regression Analysis 5E Montgomery, Peck and Vining 10

Linear Regression Analysis 5E Montgomery, Peck and Vining Measures of Influence: DFFITS and DFBETAS DFBETAS – measures how much the regression coefficient changes in standard deviation units if the ith observation is removed. where is an estimate of the jth coefficient when the ith observation is removed. Large DFBETAS indicates ith observation has considerable influence. In general, |DFBETAS j,i | > 2/

Linear Regression Analysis 5E Montgomery, Peck and Vining Measures of Influence: DFFITS and DFBETAS DFFITS – measures the influence of the ith observation on the fitted value, again in standard deviation units. Cutoff: If |DFFITS i | > 2, the point is most likely influential.

Linear Regression Analysis 5E Montgomery, Peck and Vining Measures of Influence: DFFITS and DFBETAS Equivalencies See the computational equivalents of both DFBETAS and DFFITS (page 217). You will see that they are both functions of R- student and h ii.

Linear Regression Analysis 5E Montgomery, Peck and Vining 14

Linear Regression Analysis 5E Montgomery, Peck and Vining A Measure of Model Performance Information about the overall precision of estimation can be obtained through another statistic, COVRATIO i

Linear Regression Analysis 5E Montgomery, Peck and Vining A Measure of Model Performance Cutoffs and Interpretation If COVRATIO i > 1, the ith observation improves the precision. If COVRATIO i < 1, ith observation can degrade the precision. Or, Cutoffs: COVRATIO i > 1 + 3p/n or COVRATIO i 3p).

Linear Regression Analysis 5E Montgomery, Peck and Vining 17

Linear Regression Analysis 5E Montgomery, Peck and Vining Detecting Groups of Influential Observations Previous diagnostics were “single- observation” It is possible that a group of points have high-leverage or exert undue influence on the regression model. Multiple-observation deletion diagnostic can be implemented.

Linear Regression Analysis 5E Montgomery, Peck and Vining Detecting Groups of Influential Observations Cook’s D can be extended to incorporate multiple observations: where i denotes the m  1 vector of indices specifying the points to be deleted. Large values of D i indicate that the set of m points are influential.

Linear Regression Analysis 5E Montgomery, Peck and Vining Treatment of Influential Observations Should an influential point be discarded? Yes, if –there is an error in recording a measured value; –the sample point is invalid; or, –the observation is not part of the population that was intended to be sampled No, if –the influential point is a valid observation.

Linear Regression Analysis 5E Montgomery, Peck and Vining Treatment of Influential Observations Robust estimation techniques –These techniques offer an alternative to deleting an influential observation. –Observations are retained but downweighted in proportion to residual magnitude or influence.