Outliers and Influence Points

Slides:



Advertisements
Similar presentations
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Psychology 202b Advanced Psychological Statistics, II February 8, 2011.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Regression Diagnostics - I
Regression Diagnostics Checking Assumptions and Data.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.
Linear Regression and Correlation Topic 18. Linear Regression  Is the link between two factors i.e. one value depends on the other.  E.g. Drivers age.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.
Exploring relationships between variables Ch. 10 Scatterplots, Associations, and Correlations Ch. 10 Scatterplots, Associations, and Correlations.
Topic 18: Model Selection and Diagnostics
Simple linear regression and correlation analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Analysis of Variance: Some Review and Some New Ideas
Influential Observations in Regression Measurements on Heat Production as a Function of Body Mass and Work Effort. M. Greenwood (1918). “On the Efficiency.
Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Lecture 13 Diagnostics in MLR Variance Inflation Factors Added variable plots Identifying outliers BMTRY 701 Biostatistical Methods II.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Regression Model Building LPGA Golf Performance
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Educ 200C Wed. Oct 3, Variation What is it? What does it look like in a data set?
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Residual Analysis for Data Considerations and LINE Assumptions BUSA5325.
12/17/ lecture 111 STATS 330: Lecture /17/ lecture 112 Outliers and high-leverage points  An outlier is a point that has a larger.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Unit 9: Dealing with Messy Data I: Case Analysis
CHAPTER 3 Describing Relationships
Regression Analysis AGEC 784.
Chapter 6 Diagnostics for Leverage and Influence
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Multiple Linear Regression
Econometrics Econometrics I Summer 2011/2012
Non-Linear Models Tractable non-linearity Intractable non-linearity
Regression Diagnostics
Ch12.1 Simple Linear Regression
Statistics in MSmcDESPOT
Chapter 12: Regression Diagnostics
بحث في التحليل الاحصائي SPSS بعنوان :
Diagnostics and Transformation for SLR
Residuals The residuals are estimate of the error
Influential Observations in Regression
Multiple Linear Regression
CHAPTER 3 Describing Relationships
Three Measures of Influence
Regression Diagnostics
Linear Regression and Correlation
Diagnostics and Transformation for SLR
Regression Models - Introduction
Presentation transcript:

Outliers and Influence Points NCSS metrics and descriptions

Diagnostics for Outliers and High Influence Points Outliers from the model Residuals – yi – yihat measures distance from the data to the model Standardized residual - residual divided by its standard deviation, assuring variance of the observed residuals is constant 2or 3 a priori Rstudent – standardized residual with sj (root MSE calculated without observation j, also denoted MSEj) rather than s (root MSE) in the denominator. Outliers in the data (high leverage) Hat diagonal – see text pages 206 – 208 4/N High influence points Cook’s D – attempts to measure the influence of each observation on all N fitted values, i.e. all estimated parameters .5 or 1 or 4/(N-2)

Diagnostics for Outliers and High Influence Points Dffits – attempts to measure the influence of an observation on its individual prediction 1 CovRatio – flags influential observations on the generalized variance of the regression coefficients 1 – 3p/N DFBETAS - measures the influence of an observation on the estimated BETA coefficient 2/root(N) or 1 or 2