1 Reg12M G89.2229 Multiple Regression Week 12 (Monday) Quality Control and Critical Evaluation of Regression Results An example Identifying Residuals Leverage:

Slides:



Advertisements
Similar presentations
Weighted Least Squares Regression Dose-Response Study for Rosuvastin in Japanese Patients with High Cholesterol "Randomized Dose-Response Study of Rosuvastin.
Advertisements

Transformations & Data Cleaning
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
1 G Lect 4M Interpreting multiple regression weights: suppression and spuriousness. Partial and semi-partial correlations Multiple regression in.
Structural Equation Modeling
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Inference for Regression
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 12 Multiple Regression
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 6: Multiple Regression
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Linear and generalised linear models
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
Quantitative Methods Heteroskedasticity.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
1 G Lect 7M Statistical power for regression Statistical interaction G Multiple Regression Week 7 (Monday)
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
© Copyright McGraw-Hill 2000
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
12/17/ lecture 111 STATS 330: Lecture /17/ lecture 112 Outliers and high-leverage points  An outlier is a point that has a larger.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 G Lect 3M Regression line review Estimating regression coefficients from moments Marginal variance Two predictors: Example 1 Multiple regression.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
1 Reg12W G Multiple Regression Week 12 (Wednesday) Review of Regression Diagnostics Influence statistics Multicollinearity Examples.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
1 G Lect 4W Multiple regression in matrix terms Exploring Regression Examples G Multiple Regression Week 4 (Wednesday)
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Unit 9: Dealing with Messy Data I: Case Analysis
Chapter 14 Introduction to Multiple Regression
Regression Analysis.
Chapter 6 Diagnostics for Leverage and Influence
Correlation and Simple Linear Regression
Regression Diagnostics
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Simple Linear Regression
Regression Model Building - Diagnostics
Stats Club Marnie Brennan
Correlation and Simple Linear Regression
Regression Model Building - Diagnostics
Simple Linear Regression and Correlation
Topic 11: Matrix Approach to Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

1 Reg12M G Multiple Regression Week 12 (Monday) Quality Control and Critical Evaluation of Regression Results An example Identifying Residuals Leverage: X(X’X) -1 X’ Residuals (Discrepancies)

2 Reg12M Quality Control and Critical Evaluation of Regression Results Multiple regression programs have the potential to hide details of the data »Regression output provides only summary information »When several variables are considered, bivariate plots may not be immediately revealing Regression diagnostic indicators can help with quality control »Which subjects are not well fit: Discrepancy or Residuals »Which subjects are affecting results: Influence Points

3 Reg12M An example Suppose X were a measure of SES advantage, such as years of education, W is an indictor of social disadvantage, such as immigrant status, and Y is a number of stressful life events Nothing necessarily jumps out from these data as being amiss.

4 Reg12M Example, Continued Here are the regression results Here is the plot of the fit

5 Reg12M Identifying Residuals Outlying residuals should be examined. They may stand out when they are in the center of the X distribution. When the residual in the plot is eliminated, we get the following results: An important question is whether it is proper to delete an outlying point.

6 Reg12M Sometimes Data Errors Don’t Stand Out If an erroneous point in the Y data is associated with an extreme X value, it may not show as a residual. »The OLS fit will be influenced by the bad point We can define an extreme point in the X space as Leverage An observation will have leverage if it has a relatively large value of h i, where this is the diagonal element of the matrix X(X’X) -1 X’

7 Reg12M Leverage: X(X’X) -1 X’ X(X’X) -1 X ’ is an n by n matrix that transforms Y into fitted values, Y »Y = XB = X[(X’X) -1 X’Y] = [X(X’X) -1 X’]Y It is a square, symmetric matrix with a special property: It’s square is itself! »[X(X’X) -1 X’] [X(X’X) -1 X’] =[X(X’X) -1 X’] This so-called hat matrix can be thought of as a camera that projects an image of data Y on the regression plane. ^ ^ ^

8 Reg12M Standardized Residuals Assume that fitted values of Y have been obtained, »Y= XB Residuals are calculated »E = Y - Y SPSS computes the standardized residual as the ratio of the unstandardized residual to the square root of the MSE. »The square root of MSE is S E »The SPSS standardized residual is E i /[S E ] »Values greater than 3 are suspect ^ ^ ^ ^

9 Reg12M Variance of Regression Estimates Let Y be an n x 1 vector and X be an n x q matrix of predictors: Y = XB + e

10 Reg12M Studentized Residuals The precision of the estimate of the residual varies with the leverage of the observation Instead of comparing a residual to its distribution, compare it to its own standard error »Called Studentized residual »Cohen et al. call this Internally studentized residual The SPSS studentized residual is E i /[S E (1-h i ) 1/2 ]

11 Reg12M Externally Studentized Residuals If the i th residual reveals a mistake, then S E on (n-k-1) df will be too big »Externalized Studentized residuals adjust for this »The regression is re-estimated dropping the i th observation, and a new S E(i) is computed on (n-k-2) df. »The discrepancy of the point is computed by comparing Y i to the fitted Y from the model that excludes to point in question. »The externally studentized residual is e i /[S E(i) (1-h i ) 1/2 ]

12 Reg12M Example Sorted by Discrepancy Measures Both the studentizing and deleting operations tend to increase the size of the standardized residuals

13 Reg12M Discrepancies and Quality Control of Data When carrying out data analysis, it is important to make sure the data are clean. »Initially we look at distributions, scatterplots. »Out-of-range values especially are important Discrepancy analysis is a second order cleaning step »Some discrepancies may be clear errors »Other discrepancies may reveal special populations or circumstances