REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –

Slides:



Advertisements
Similar presentations
The Simple Regression Model
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 13 Additional Topics in Regression Analysis
BA 555 Practical Business Analysis
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Chapter Topics Types of Regression Models
Regression Diagnostics - I
1 Simple Linear Regression and Correlation Chapter 17.
Statistics 350 Lecture 10. Today Last Day: Start Chapter 3 Today: Section 3.8 Homework #3: Chapter 2 Problems (page 89-99): 13, 16,55, 56 Due: February.
Introduction to Probability and Statistics Linear Regression and Correlation.
Violations of Assumptions In Least Squares Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Business Statistics - QBM117 Statistical inference for regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
Chapter 13 Simple Linear Regression
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Lecture 10: Correlation and Regression Model.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Linear Regression.
Statistics for Managers using Microsoft Excel 3rd Edition
Chapter 12: Regression Diagnostics
Chapter 13 Simple Linear Regression
No notecard for this quiz!!
BASIC REGRESSION CONCEPTS
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Regression Assumptions
Chapter 13 Additional Topics in Regression Analysis
Adequacy of Linear Regression Models
Regression Assumptions
Chapter 13 Simple Linear Regression
Presentation transcript:

REGRESSION MODEL ASSUMPTIONS

The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part – getting the best estimates for the  ’s Here we focus on the error term, 

THE RANDOM VARIABLE,  The error term, , is a random variable that describes how the observed values, y i, vary around the regression line. For any value of x,  has a distribution with a mean and a standard deviation At any x value x i, the observed value of the error term is called its residual, given by:

STEP 3: 4 ASSUMPTIONS ABOUT  The remainder of our discussion about linear regression assumes the following about  (1) DISTRIBUTION: –  is distributed normally (2) MEAN: –The errors average out to 0, i.e. E(  ), or  = 0 (3) STANDARD DEVIATION: same – , is the same at all values of x (4) INDEPENDENCE: independent –The errors are independent of each other

What Do These Assumptions Imply About y? y =  0 +  1 x + .  0 +  1 x is a constant for a given value of x  is normally distributed with mean 0 and standard deviation . Thus y is normally distributed with standard deviation  and mean E(y), E(y) = E(  0 +  1 x +  ) = E(  0 +  1 x) + E(  )  =  0 +  1 x + 0 =  0 +  1 x

BEST ESTIMATE FOR  The true value of  is unkown. It can estimated by s as follows:

Hand Calculation of SSE SUM SSE 22 iiiii )( )y ˆ (y )y ˆ y ˆ y x i 

s Residual  Error SSE/(n-2) = s 2 SSE

Checking the Assumptions Many times it is just assumed that the assumptions hold. We now show how to check the assumptions.

Residuals RESIDUAL ANALYSISThe assumptions for  can be checked using RESIDUAL ANALYSIS. A residual, e i, is the observation of  at an observed value of x, x i. For example in the Dollar Only example: y 1 = 101,000 when x 1 = 1200

Standardized Residuals Is a residual of -8, large? –It depends on the size of a standard error, s. Standardized residual = e i /(standard error of e i for x i ). Standardized residuals are easier to use to test the assumptions. Two typical ways for calculating the standard error of e i for a particular x i value are: Both approaches yield substantially the same results.

Standardized Residuals in Excel Excel uses the following formula: This still gives approximately the same values as the other methods. We will use the ones generated by Excel to check the assumptions.

Checking to See if Errors (Residuals) Appear to Come From a Normal Distribution TWO WAYS TO CHECK Construct a plot of standardized residuals and see if they look normal –Could use Histogram from Data Analysis –A “quick check” – Standardized residuals are like z-values. Check to see if about 68% are between ± 1, 95% between ± 2, and virtually all between ± 3. Look at a normal probability plot. These are statistical plots to check for “normality”. A “perfect” normal distribution would be a straight line on such a plot.

Checking to see if  Is Constant Look at the residual plot to see if the points seem more spread out at some x’s than at others – in the Dollar Only example, it did not appear so on the Excel residual plot. homoscedasticityConstant  is called homoscedasticity! heteroscedasticityIf the points had looked like the next page, then we see for lower values of x there is less variation than at higher values and the constant variation assumption would have been violated. This is called heteroscedasticity!

x e Heteroscedasticity– Nonconstant Variance

Checking Independence This is mainly for time series data (i.e. the x-axis is time) used in forecasting But basically if the data looks like the next slide – errors are not independent –In this case whether you have a positive or negative error (residual) depends on the x- value. –This is called autocorrelation.

X=time Y Example of Autocorrelation (Errors are Dependent on x)

Residual Analysis in Excel CHECK: Residuals Standardized Residuals Residual Plots Normal Probability Plots

Standardized Residuals 70% are between ± 1 100% are between ±2 “Close” to expected normal normal values Residual values appear to average out to 0 everywhere. There is no discernable pattern for the errors.

Normal Probability Plot The following is the normal probability plot generated by Excel. Again Excel does it “slightly wrong”, but it should give us a good idea. Looks close to a straight line – normality assumption appears valid.

Review 4 assumptions about  1.  is normal. 2.  = E(  ) =  is the same for all values of x. 4.Errors are independent. Checking The Assumptions –Check residual plot to see if variation changes for different values of x. –Check normality assumption by a normal probability plot or by creating a histogram of standardized residuals. Does it appear normal and centered around 0? Are about 68% between ±1, 95% between ±2, almost all between ±3?