Motivational Examples Three Types of Unusual Observations

Slides:



Advertisements
Similar presentations
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Advertisements

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notes on Residuals Simple Linear Regression Models.
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Chapter 11: Inference for Distributions
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Correlation & Regression
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
MATH 2311 Section 5.4. Residuals Examples: Interpreting the Plots of Residuals The plot of the residual values against the x values can tell us a lot.
Chapter 12: Regression Diagnostics
Lecture 18 Outline: 1. Role of Variables in a Regression Equation
Regression Model Building - Diagnostics
Diagnostics and Transformation for SLR
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Lecture 14 Review of Lecture 13 What we’ll talk about today?
1. An example for using graphics
1. Describe the Form and Direction of the Scatterplot.
Outliers… Leverage… Influential points….
Residuals The residuals are estimate of the error
No notecard for this quiz!!
Solutions to Tutorial 6 Problems
LESSON 4.4. MULTIPLE LINEAR REGRESSION. Residual Analysis
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Chapter 4, Regression Diagnostics Detection of Model Violation
Three Measures of Influence
Regression Model Building - Diagnostics
Checking the data and assumptions before the final analysis.
Checking Assumptions Primary Assumptions Secondary Assumptions
Chapter 13 Multiple Regression
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Essentials of Statistics for Business and Economics (8e)
Tutorial 6 Problems (4.1, page 116) Check to see whether or not the standard regression assumptions are valid for each of the following data sets(downloadable.
Diagnostics and Transformation for SLR
Model Adequacy Checking
Presentation transcript:

Motivational Examples Three Types of Unusual Observations Lecture 15 Outline: Motivational Examples Three Types of Unusual Observations 12/3/2018 ST3131, Lecture 15

(Problem 4.3, Page 116) Computer Repair Data (1) For the data on Page 27, n=14, p=1 (a) Fit a linear regression model relating Minutes to Units 12/3/2018 ST3131, Lecture 15

Check each of the standard regression assumptions and indicate which assumption(s) seems to be violated. Assumptions about the form of the model Assumptions about the measurement errors Assumptions about the predictor variables Assumptions about the observations 12/3/2018 ST3131, Lecture 15

(1) For the data on Page 117, n=24, p=1 (a) Fit a linear regression model relating Minutes to Units 12/3/2018 ST3131, Lecture 15

Check each of the standard regression assumptions and indicate which assumption(s) seems to be violated. Assumptions about the form of the model Assumptions about the measurement errors Assumptions about the predictor variables Assumptions about the observations 12/3/2018 ST3131, Lecture 15

Leverage, Influence, and Outliers Assumption about the observations requires that each observation should play a similar role in the regression fit. That is, it requires that a fit is not overly determined by one or few observations. If there are such points, it is necessary to find them out. High Leverage points, Influence points and outliers are such points. High Leverage points /Outliers in the Predictor variables Pii are called the leverage of observation Xi . It reflects how far Xi is from the sample mean of the predictor variables. The Function is called Potential Function of observation Xi . Observations with Larger are called High Leverage points. Points with greater than are usually regarded as high leverage points. High Leverage points are also called outliers in the Predictor variables (in X-direction). 12/3/2018 ST3131, Lecture 15

New York Rivers Data (Page 10) A linear regression fit relating Nitrogen to Com./Indus. A plot of the leverage values (index plot, dot plot, or box plot) will reveal Points with high leverage observations. From the raw data , It can be found that Observation 5 (Hackensack river) is an urban river close To New York City while other rivers are in the countryside (in the rural area) 12/3/2018 ST3131, Lecture 15

Outliers in the Response variable(in Y-direction) Observations with large standardized residuals are outliers in the Responses variable. These outliers’ response values are far from the Sample center of the response variable (in Y-direction), so are their Standardized residuals from 0. Observations with absolute standardized Residuals greater than 2 or 3 are usually called outliers A plot of the standardized residuals (index plot, dot plot, or box plot or plot against fitted values) will usually reveal outliers in the Y-direction. 12/3/2018 ST3131, Lecture 15

Influential Points A point is an Influential point if its deletion, singly or in combination With others (2 or 3) , causes substantial changes in the fitted model ( Estimation, fitted values, t-test, etc) 12/3/2018 ST3131, Lecture 15

After-class Questions: How to measure the influence of an observation? How to detect an influential observation? Are high-leverage points influential? Are outliers influential? 12/3/2018 ST3131, Lecture 15