Assumptions underlying regression analysis

Slides:



Advertisements
Similar presentations
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
Advertisements

SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Simple Linear Regression (Session 02)
The Poisson distribution
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics Tests for Variances (Session 11)
SADC Course in Statistics Inferences about the regression line (Session 03)
SADC Course in Statistics Importance of the normal distribution (Session 09)
SADC Course in Statistics Revision of key regression ideas (Session 10)
Correlation & the Coefficient of Determination
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics Linking tests to confidence intervals (and other issues) (Session 10)
SADC Course in Statistics A model for comparing means (Session 12)
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
Simple Linear Regression Analysis
CHAPTER 14: Confidence Intervals: The Basics
Multiple Regression and Model Building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Pertemua 19 Regresi Linier
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
STA291 Statistical Methods Lecture 27. Inference for Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Lecture 10: Correlation and Regression Model.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Howard Community College
Regression Analysis AGEC 784.
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistics for Managers using Microsoft Excel 3rd Edition
Inference for Regression
Diagnostics and Transformation for SLR
CHAPTER 29: Multiple Regression*
Residuals The residuals are estimate of the error
Simple Linear Regression
Diagnostics and Transformation for SLR
Presentation transcript:

Assumptions underlying regression analysis SADC Course in Statistics Assumptions underlying regression analysis (Session 05)

Learning Objectives At the end of this session, you will be able to describe assumptions underlying a regression analysis conduct analyses that will allow a check on model assumptions discuss the consequences of failure of assumptions consider remedial action when assumption fail

Checking assumptions Describing the relationship carries no assumptions. However, inferences concerning the slope of the line, e.g. by use of t-tests or F-tests, are subject to certain assumptions. Checking assumptions is important to avoid pitfalls associated with making invalid conclusions.

Assumptions The simple linear regression model is: yi = 0 + 1 xi + i In addition to assuming a linear form for the model, the i are assumed to be independent, with zero mean and constant variance 2, and be normally distributed. Note: Model predictions, often called fitted values, are

How to check assumptions? The usual approach is to conduct a residual analysis. Residuals are deviations of observed values from model fitted values. Paddy data relating yield to fertiliser Residual

Residual Plots Plotting residuals in various ways allows failure of assumptions to be detected. e.g. To check the normality assumption, plot a histogram of the residuals (provided there are enough observations); or do a normal probability plot of residuals. A straight line plot indicates that the normality assumption is reasonable.

Residual Plots - continued Most useful is a plot of residuals against fitted values( ) . It helps to detect failure of the variance homogeneity assumption. Also helps to identify potential outliers. e.g. If standardised residuals are used, i.e. residuals/standard error, then 95% of observations would be expected to lie between –2 and +2. A random scatter with no obvious pattern is good! Some examples follow….

Some Residual Plots x x x x x x x x x x x x x x x x x x x x x x x x x A random scatter as above is good. It shows no obvious departures of the variance homogeneity assumption.

Some Residual Plots Variance increases with increasing x. x x x x x x x x x x x x x x x Variance increases with increasing x. Could try a loge(y) transformation.

Some Residual Plots x x x x x x x x x x x x x x x x x x x x x x x x x x Indication that the response is a binomial proportion. Use a logistic regression model.

Some Residual Plots x x x x x x x x x x x x x x x x x x x x x x x x x Lack of linearity. Pattern indicates an incorrect model - probably due to a missing squared term.

Some Residual Plots x x x x x x x x x x x x x x x x x x x x x x x x x Presence of an outlier. Investigate if there is a reason for this odd-point.

Consequences of assumption failure Studies on consequences of assumption failure have demonstrated that: tests and confidence intervals for means are relatively robust to small departures from non-normality; the effects of non-homogeneous variance can be large, but not so serious if sample sizes in different sub-groupings are equal; dependence of observations can badly affect F-tests.

Dealing with assumption failure One approach is to find a transformation that will stabilize the variance. Some typical transformations are: taking logs (useful when there is skewness); square root transformation; reciprocal transformation. Sometimes theoretical grounds will determine the transformation to use, e.g. when data are Poisson or Binomial. However, in such cases, exact methods of analysis will be preferable.

Dealing with non-independence The assumption of independence is quite critical. Some attention to this is needed at the data collection stage. If observations are collected in time or space, plotting residuals in time (or space) order may reveal that subsequent observations are correlated. Techniques similar to those used in time series analysis or analysis of repeated measurements data may be more appropriate.

An illustration – using paddy data Histogram of standardised residuals after fitting a linear regression of yield on fertiliser. This is a check on the normality assumption.

A normal probability plot… Another check on the normality assumption Do you think the points follow a straight line?

Std. residuals versus fitted values Checking assumption of variance homogeneity, and identification of outliers: Do you judge this to be a random scatter? Are there any outliers?

Conclusion: The residual plots showed no evidence of departures from the model assumptions. We may conclude that fertiliser does contribute significantly to explaining the variability in paddy yields. Note: Always conduct a residual analysis after fitting a regression model. The same concepts carry over to more complex models that may be fitted.

Practical work follows to ensure learning objectives are achieved…