Occasionally, we are able to see clear violations of the constant variance assumption by looking at a residual plot - characteristic “funnel” shape… often.

Slides:



Advertisements
Similar presentations
Heteroskedasticity Hill et al Chapter 11. Predicting food expenditure Are we likely to be better at predicting food expenditure at: –low incomes; –high.
Advertisements

Assumptions underlying regression analysis
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
2DS00 Statistics 1 for Chemical Engineering Lecture 3.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Stat 112: Lecture 14 Notes Finish Chapter 6:
Petter Mostad Linear regression Petter Mostad
Chapter Topics Types of Regression Models
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Statistics 350 Lecture 10. Today Last Day: Start Chapter 3 Today: Section 3.8 Homework #3: Chapter 2 Problems (page 89-99): 13, 16,55, 56 Due: February.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Regression Diagnostics Checking Assumptions and Data.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
Chapter 12 Section 1 Inference for Linear Regression.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
9 - 1 Intrinsically Linear Regression Chapter Introduction In Chapter 7 we discussed some deviations from the assumptions of the regression model.
STA291 Statistical Methods Lecture 27. Inference for Regression.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
The simple linear regression model and parameter estimation
Model validation and prediction
Regression Diagnostics
Chapter 12: Regression Diagnostics
Bell Ringer Make a scatterplot for the following data.
BIVARIATE REGRESSION AND CORRELATION
Diagnostics and Transformation for SLR
CHAPTER 29: Multiple Regression*
Simple Linear Regression
CHAPTER 12 More About Regression
Inferences for Regression
Diagnostics and Transformation for SLR
Remedial Procedure in SLR: transformation
Presentation transcript:

Occasionally, we are able to see clear violations of the constant variance assumption by looking at a residual plot - characteristic “funnel” shape… often this can be fixed through a variance stabilizing transformation. if the standard deviation of the response is proportional to the mean, then often the logarithm transformation of the response works…do a regression of log(y) against the explanatory variables if the variance of the response is proportional to the mean, then often the square root transformation of the response works… do a regression of sqrt(y) against the expl. variables…

In any case, always perform the transformation on the response and then refit the regression and check the residuals to make sure you’ve found the transformation that shows the best residual plots. Note that if you transform the response you will probably need to express the predictions back in the original scale - so if you fit log(y) the prediction will be exp( ). The regression coefficients will have to be interpreted though on the transformed scale. For the log transform though, we have a nice interpretation:

This implies that an increase in 1 for x 1 means that the original response is predicted to increase by a factor of ; this means that the coefficients can be interpreted as multiplicative effects instead of additive ones. Let’s consider the Box-Cox method of determining a transformation. It should be used with positive response variables and the method finds the transformation that gives the best fit. It uses the general formula Using maximum likelihood we may find the “best” value of lambda - actually a confidence interval for lambda … see the R code…

#read in the gasconsumption data #bring in the MASS library and apply the #boxcox function on the simple linear model attach(gasconsumption) g=lm(MPG~WT) ; summary(g) library(MASS) boxcox(g,plotit=T) #plot log-likelihood #against lambda - find the maximum #notice that values between ~.25 and -1.5 #are in the 95% confidence interval of #the maximum. Your authors chose -1 #and worked with GPM instead of MPG since #GPM=1/MPG. If you want to find the exact #lambda, try this… l=boxcox(g); l$x[l$y==max(l$y)] #note this is #harder to interpret than it’s rounded value #-1…

Now for practice, load the faraway library and get the dataset called prostate. Look at the help file for the dataset and go through the various diagnostics that we’ve considered in this chapter and find the best model for predicting log(psa) –check the normality assumption on the errors - are any transformations required? –find large leverage points & look for outliers –see if there are influential points –is the constant variance assumption met? HW: Work on #6.1, 6.14, 6.15, 6.18, 6.20,6.21, 6.23