Assumptions in linear regression models

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Chapter 12 Inference for Linear Regression
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Linear regression models
Objectives (BPS chapter 24)
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
BA 555 Practical Business Analysis
The Simple Regression Model
Lecture 20 Simple linear regression (18.6, 18.9)
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Correlation & Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Simple linear regression Tron Anders Moger
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multicollinearity: an introductory example A high-tech business wants to measure the effect of advertising on sales and likes to distinguish between traditional.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
CHAPTER 12 More About Regression
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression.
AP Statistics Chapter 14 Section 1.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Correlation and Simple Linear Regression
Inferences for Regression
CHAPTER 12 More About Regression
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Simple Linear Regression
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Inferences for Regression
Regression Models - Introduction
Correlation and Simple Linear Regression
Presentation transcript:

Assumptions in linear regression models Unit 2 Assumptions in linear regression models Yi = β0 + β1 x1i + … + βk xki + εi, i = 1, … , n Assumptions x1i , … , xki are deterministic (not random variables) ε1 … εn are independent random variables with null mean, i.e. E(εi) = 0 and common variance, i.e. V(εi) = σ2. Consequences E(Yi) = β0 + β1 x1i + … + βk xki and V(Yi) = σ2. i = 1, … , n The OLS (ordinary least squares) estimators of β0, … βk indicated with b1, …, bk are BLUE (Best Linear Unbiased Estimators) – Gauss Markov theorem.

Normality assumption Unit 2 If, in addition, we assume that the errors are Normal r.v. ε1 … εn are independent NORMAL r.v. with null mean and common variance σ2, i.e. εi ~N(0, σ2), i = 1, … , n Consequences Yi ~ N( β0 + β1 x1i + … + βk xki , σ2), i = 1, … , n bi ~ N( βi V(bi)), i = 0, … , k The normality assumption is needed to make reliable inference (confidence intervals and tests of hypotheses). I.e. probability statements are exact If the normality assumption does not hold, under some conditions, a large n (observations), via a Central Limit theorem allows reliable asymptotic inference on the estimated betas.

Unit 2 Checking assumptions The error term ε is unobservable. Instead we can provide an estimate by using the parameter estimates. The regression residual is defined as  ei = yi – yi , i= 1, 2, ... n Plots of the regression residuals are fundamental in revealing model inadequacies such as non-normality unequal variances presence of outliers correlation (in time) of error terms

Detecting model lack of fit with residuals Unit 2 Detecting model lack of fit with residuals Plot the residuals ei on the vertical axis against each of the independend variables x1, ..., xk on the horizontal axis. Plot the residuals ei on the vertical axis against the predicted value y on the horizontal axis. In each plot look for trends, dramatic changes in variability, and /or more than 5% residuals lie outside 2s of 0. Any of these patterns indicates a problem with model fit.  Use the Scatter/Dot graph command in SPSS to construct any of the plots above.

Examples: residuals vs. predicted Unit 2 Examples: residuals vs. predicted fine nonlinearity unequal variances outliers

Examples: residuals vs. predicted Unit 2 Examples: residuals vs. predicted auto-correlation nonlinearity and auto-correlation

Partial residuals plot Unit 2 Partial residuals plot An alternative method to detect lack of fit in models with more than one independent variable uses the partial residuals; for a selected j-th independent var xj, e* = y – (b0+ b1x1+...+ bj-1xj-1 + bj+1xj+1 + ... + bkxk ) = e + bjxj Partial residuals measure the influence of xj after the effects of all other independent vars have been removed. A plot of the partial residuals for xj against xj often reveals more information about the relationship between y and xj than the usual residual plot. If everything is fine they should show a straight line with slope bj. Partial residual plots can be calculated in SPSS by selecting “Produce all partial plots” in the “Plots” options in the “Regression” dialog box.

Example Model 1: E(Y) = β0 + β1x1 + β2x2 Unit 2 Example A supermarket chain wants to investigate the effect of price p on the weekly demand of a house brand of coffee. Eleven prices were randomly assigned to the stores and were advertised using the same procedure. A few weeks later the chain conducted the same experiment using no advertising Y : weekly demand in pounds X1: price, dollars/pound X2: advertisement: 1 = Yes, 0 =No. Model 1: E(Y) = β0 + β1x1 + β2x2 Data: Coffee2.sav

Computer Output Unit 2

Residual and partial residual (price) plots Unit 2 Residuals vs. price. Shows non-linearity Partial residuals for price vs. price. Shows nature of non-linearity. Try using 1/x instead of x

E(Y) = β0 + β1(1/x1) + β2x2 Unit 2 RecPrice = 1/Price

Residual and partial residual (1/price) plots Unit 2 After fitting the independent variable “x1 = 1/price” the Residual plot does not show any pattern and the Partial residual plot for (1/price) does not show any non linearity.

An example with simulated data The true model, supposedly unknown, is Y = 1 + x1 + 2∙x2 + 1.5∙x1∙x2 + ε, with ε~N(0,1) Data: Interaz.sav Fit a model based on data Fit a model based on data X2 Cor(X1,X2)=0.131 Y x1 x2

Model 1: E(Y) = β0 + β1x1 + β2x2 Anovab SS df MS F Sig. Regressione 8447,42 2 4233,711 768,494 ,000a Residuo 533,12 97 5,496 Totale 8980,54 99 Adj. R2=0.939 Coefficientia t Sig. B DS VIF 1 (Costante) -6,092 ,630 -9,668 ,000 X1 3,625 ,207 17,528 1,018 X2 6,145 ,189 32,465

Model 1: standardized residual plot Nonlinearity is present To what is due? Since the scatter-plots do not show any non-linearity it could be due to an interaction Y

Model 1: partial regression plots Show that linear effects are roughly fine. But some non-linearity shows up X1 X2

Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Anovab SS df MS F Sig. Regressione 8885,372 3 2961,791 2987,64 000a Residuo 95,169 96 ,991 Totale 8980,541 99 Adj. R2=0.989 Coefficientia t Sig. B DS VIF 1 (Costante) ,305 ,405 ,753 ,453 X1 1,288 ,142 9,087 ,000 2,648 X2 2,098 ,209 10,051 6,857 IntX1X2 1,411 ,067 21,018 9,280

Model 2: standardized residual plot Looks fine

Model 2: partial regression plots Maybe an outlier is present X1 X2 All plots show linearity of the corresponding terms X1X2

Model 3: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x22 Suppose I wanto to try fitting a quadratic term Anovab SS df MS F Sig. Regressione 8890,686 4 2222,67 2349,92 ,000a Residuo 89,856 95 ,946 Totale 8980,541 99 Adj. R2=0.990 Coefficientia t Sig. B DS VIF 1 (Costante) ,023 ,413 ,055 ,956 X1 1,258 ,139 9,051 ,000 2,670 X2 2,615 ,299 8,757 14,713 IntX1X2 1,436 ,066 21,619 9,528 X2Square -,137 ,058 -2,370 ,020 11,307 x22 seems fine Higher MC

Model 3: standardized residual plot Looks fine

Model 3: partial regression plots X1 X2 Doesn’t show “linearity” X1X2 X22

Checking the normality assumption Unit 2 Checking the normality assumption The inference procedures on the estimates (tests and confidence intervals) are based on the Normality assumption on the error term ε. If this assumption is not satisfied the conclusions drawn may be wrong. Again, the residuals ei are used for checking this assumption Two widely used graphical tools are the P-P plot for Normality of the residuals the histogram of the residuals compared with the Normal density function. The P-P plot for Normality and histogram of the residuals can be calculated in SPSS by selecting the appropriate boxes in the “Plots” options in the “Regression” dialog box.

Social Workers example: E(ln(Y)) = β0 + β1x Unit 2 Histogram should match the continuous line Points should be as close as possible to the straight line Both graphs do not show strong departures from the Normality assumption.