Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Inference for Regression
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Chapter 12 Simple Regression
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Chapter Topics Types of Regression Models
Regression Diagnostics Checking Assumptions and Data.
CHAPTER 3 Describing Relationships
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Regression Method.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Correlation & Regression
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Exploratory Data Analysis Observations of a single variable.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
CHAPTER 3 Describing Relationships
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Example x y We wish to check for a non zero correlation.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Stats Methods at IC Lecture 3: Regression.
Lecture Slides Elementary Statistics Twelfth Edition
CHAPTER 3 Describing Relationships
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
CHAPTER 3 Describing Relationships
Chapter 6 Diagnostics for Leverage and Influence
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 4, Regression Diagnostics Detection of Model Violation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
The Examination of Residuals
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
CHAPTER 3 Describing Relationships
Presentation transcript:

Worked Example Using R

> plot(y~x)

>plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

>plot(epsilon1~yhat) This is a plot of residuals against the fitted values, yhat.

Both graphs show the same thing … the residuals are following a random pattern. Note: Since the equation is approximately y=x, both graphs are extremely similar in this case.

Model Diagnostics: Residuals and Influence

Consider again the problem of fitting the model y i = f(x i ) + ε i i = 1,……….n Assume again a single continuous response variable y. The explanatory variable x may be either a single variable, or a vector of variables. How do we assess the quality of a given fit f?

While summary statistics are helpful, they are not sufficient. Good diagnostics are typically based on case analysis, i.e. an examination of each observation in turn in relation to the fitting procedure. This leads to an examination of residuals and influence.

Residuals The residuals should be thought of as what is left of the values of the response variable after the fit has been subtracted. Ideally they should show no further dependence (especially no further location dependence) on x.

In general this should be investigated graphically by plotting residuals against the explanatory variable(s) x. For linear models, we frequently compromise by plotting residuals against fitted values.

In particular the residuals provide information about: *whether the best relation has been fitted *the relative merits of different fits *mild, but non-random, departures from the hypothesised fit *the magnitude of the residual variation

*the identification of outliers *possible further dependence on x, other than through location, of the conditional distribution of y given x - in particular heterogeneity of spread of the residuals.

Example:Anscombe’s Artificial Data The R data frame anscombe is made available by > data(anscombe) This contains 4 artificial datasets, each of 11 observations of a continuous response variable y and a continuous explanatory variable x. The data are now plotted along with the result of the least squares linear model to the corresponding dataset.

All the usual summary statistics related to the classical analyses of the fitted models are identical across the 4 datasets. This includes the coefficients a and b and their standard errors and confidence intervals, together with the residual standard errors and correlation coefficients. ^ ^

Consideration of the residuals shows that very different judgements should be made about the appropriateness of the fitted model to each of the 4 cases. A full discussion is given by Weisberg (1985, pp107,108).

Influence Influence measures the extent to which a fit is affected by individual observations. A possible formal definition is the following: the influence of any observation is a measure of the difference between the fit and the fit which would be obtained if that observation were omitted.

Obviously observations with large influences require more careful checking. Especially for linear models, influence is often measured by Cook's distance.

Cook’s Distance Formula

As a rule of thumb, observations for which D i > 1 make a noticeable difference to the parameter estimates, and should be examined carefully for the appropriateness of their use in fitting the model. Clearly an observation with a large residual also has a large influence. However, an observation with an unusual value of its explanatory variable(s) can pull a fit towards it and have a large influence though a small residual.

Example: Anscombe's third data set. The last graph produced by the plot function shows that the observation number 3 has an unusually large value of Cook's distance D 3 = >plot(model3) produces:

We now refit the data omitting this observation. >x5=x3[-3] >y5=y3[-3] >model5=lm(y5~x5)