STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Model Adequacy Checking in the ANOVA Text reference, Section 3-4, pg
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Objectives (BPS chapter 24)
Multiple regression analysis
Descriptive Statistics In SAS Exploring Your Data.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –
Regression Diagnostics - I
BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.
Incomplete Block Designs
Regression Diagnostics Checking Assumptions and Data.
Inferences About Process Quality
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Inference for regression - Simple linear regression
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Model Checking Using residuals to check the validity of the linear regression model assumptions.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
STA305 week21 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
STA305 week 51 Two-Factor Fixed Effects Model The model usually used for this design includes effects for both factors A and B. In addition, it includes.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Parameter, Statistic and Random Samples
CHAPTER 12 More About Regression
Step 1: Specify a null hypothesis
Inference for Least Squares Lines
CHAPTER 12 More About Regression
Slides by JOHN LOUCKS St. Edward’s University.
Diagnostics and Transformation for SLR
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Joanna Romaniuk Quanticate, Warsaw, Poland
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Essentials of Statistics for Business and Economics (8e)
Diagnostics and Transformation for SLR
Model Adequacy Checking
Presentation transcript:

STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are difficulties \ in interpreting the results of the ANOVA. The assumptions for the one-factor model and for most of the models we will discuss in the future, consists of an assumption about the form of the model as well as assumptions about the errors.

STA305 week 32 Assumptions for the One-Factor Model Model Form: the model assumes that the mean response within each treatment group is of the form E(Y ij ) = μ + τ i. Independence: model assumes that the errors, ε ij, are independent of each other. Homoscedasticity: the model assumes that the, ε ij have a common variance. Normality: the model assumes that the, ε ij have a normal distribution with mean 0 and constant variance σ 2. Note, the model might be adequate for most observations, but outliers (unusual observations) might have large impact on parameter estimates and hypothesis tests. Therefore, besides checking the model assumptions we should also check for potential outliers.

STA305 week 33 Residuals Most of the assumptions that need to be validated involve some aspect of the errors, ε ij. Notice that ε ij = Y ij − μ − τ i. These residuals, εij, can be estimated by the observed residuals as follows:. We often use standardized residuals, z ij, either instead of, or alongside the residuals. Standardized residuals, z ij, are given by: The standardized residuals, z ij, have mean 0 and variance 1, which makes it useful and easier for detecting outlier.

STA305 week 34 Checking Model Form To assess whether the model form was correctly specified, we plot the standardized residuals against the treatments (factor levels). If the model is adequate, residuals should be centered around 0 for each treatment group. Residuals that are not centered at 0, or other show any other non- random patterns could indicate lack of fit. In slide 5, the graph on the left, shows a residual plot for a case where model form is correct. The graph on the right shows a residual plot for a case where model has not been correctly specified.

STA305 week 35

6 Checking for Constant Variance The plot of standardized residuals versus treatments discussed above can also be used to determine whether the variance is constant across treatments. The spread of the standardized residuals should be similar within each treatment group. For the data in both Figures on slide 5 the variance appears to be constant. The graph on the next slide (slide 7), is for data in which the variance is not likely to be constant. We often use a rule-of-thumb to determine whether this constant variance assumption is valid on not. The rule-of-thumb is that the ratio of the largest treatment standard deviation to the smallest should be less then or equal to 3, i.e,...

STA305 week 37

8 Another Aspect of Constant Variance Sometimes the variance is non-constant but the differences are due to size of observation rather than treatment. The residuals might be larger (or smaller) for larger values of response. A plot of residuals,, versus fitted values,, is useful for detecting non-constant variance.

STA305 week 39 Variance-Stabilizing Transformations When unequal variances are detected (heteroscedasticity), we might be able to transform the data so that the assumption of common variances holds. For example, in the graph on slide 7, it looks like the standard deviation is linearly related to the treatment level, that is σ i = iσ. In this case, should have constant variances. Generally, we must find some function h so that and all model assumptions are satisfied.

STA305 week 310 Checking for Outliers Outliers are observations that are unusually large or unusually small. Outliers can be spotted on plot of standardized residuals versus treatment. The previous graphs do not appear to have any outliers. The graph below (slide 11) contains several outliers.

STA305 week 311

STA305 week 312 Checking for Error Term Dependence Recall, the model requires that the ε ij be statistically independent for all i ≠ j. Dependence sometimes appears in experimental units that were tested close together in time and/or space. To examine whether this has happened, plot residuals in time/space order. If independence assumption is satisfied there should be no pattern in the residuals. The left graph on slide 13 is a residual plot with no order; while the right graph shows a case where there might be dependence.

STA305 week 313

STA305 week 314 Checking for Normality If the model assumption about the normality of the residuals holds, then a Q-Q plot (or normal probability plot) of standardized residuals should be approximately straight line. We can also plot a histogram or stem-and-leaf plot of standardized residual to check the normality assumption.

STA305 week 315 Using SAS to Generate Plots An example will be used to demonstrate how to generate residual plots using SAS. An experiment was conducted in order to compare effects of auditory versus visual signals on speed of response of human subjects. Computer was used to present a stimulus. Reaction time required by subject to press a key was recorded. Subjects were given either visual or auditory signal that stimulus was coming. Time between cue and stimulus was either 5, 10, or 15 seconds.

STA305 week 316 Thus, there were 6 treatments in total:

STA305 week 317 The Data: In addition to recording subject response times, the order of testing was recorded as well. Response times are in seconds and order of testing is in brackets.

STA305 week 318 Creating SAS Dataset To create a SAS dataset in the usual manner use the following code: data response ; input treatment response order ; cards ; ; run ;

STA305 week 319 Fit Model and Obtain Residuals Residuals are calculated by PROC GLM and can be output to a SAS dataset: PROC GLM DATA = response ; CLASS treatment ; MODEL response = treatment ; OUTPUT OUT = resid RESIDUAL = e P=fitted ; run ; The OUTPUT statement creates new dataset, called resid, containing the original data and a new variable, e (residuals)

STA305 week 320 Getting the Standardized Residuals Use PROC STANDARD to standardize residuals obtained in previous step: proc standard data = resid out=stdresid (rename=(e=z)) std=1.0; var e ; run ; This creates a new dataset called stdresid, and the standardized residuals are in a variable called z.

STA305 week 321 Plot of Residuals versus Treatment Several model assumptions can be checked by plotting standardized residuals versus treatment. The following code can be used to obtain this plot. proc gplot data = stdresid ; plot z * treatment / vref = 0 ; run ; quit ; The resulting plot is given on the next slide.

STA305 week 322

STA305 week 323 Results Based on the plot in the previous slide the following observations can be made:  There are no outliers.  The residuals are centered around 0 for each treatment.  The range of standardized residuals is similar for each treatment.

STA305 week 324 Checking for Constant Variance To check for constant variance across treatment groups, use PROC MEANS to get standard deviations: PROC MEANS data = stdresid ; CLASS treatment ; VAR z ; RUN ; The output is as follows: N treatment Obs Std Dev

STA305 week 325 Note, we can use the rule of thumb for verifying constant variance as follows: The ratio is less than 3, so constant variance assumption is reasonable. Further, the plot doesn’t suggest unequal variances so it’s OK to assume variance is constant across treatments.

STA305 week 326 Plot Residuals versus Fitted Values A second check for constant variance involves plotting standardized residuals versus fitted values. Fitted values were obtained from PROC GLM and are contained in the dataset stdresid. Use the following code to obtain graph. proc gplot data = stdresid ; plot z * fitted / vref = 0 ; run ; quit ; The plot of residuals versus fitted values is given in the next slide. It doesn’t appear that the residuals increase/decrease with fitted value, therefore, it is OK to assume constant variance.

STA305 week 327

STA305 week 328 Plot Residuals versus Test Order To check for possible dependence of error terms, plot standardized residuals versus order in which observations were obtained. Use the following code: proc gplot data = stdresid ; plot z * order / vref = 0 ; run ; quit ; The corresponding plot is given in the next slide. It does not suggest any dependence due to testing order.

STA305 week 329

STA305 week 330 Normal Probability Plot Normal probability plots can be obtained from PROC UNIVARIATE in SAS: PROC UNIVARIATE DATA = stdresid plots ; var z ; probplot/normal (mu=0 sigma=1) square; RUN ; The plot generated by these statements is given in the next slide. Since plotted points fall close to straight line, it appears that the assumption of normality is reasonable.

STA305 week 331

STA305 week 332 Concluding Remarks None of the assumptions was violated in this example. Hence, it is OK to fit model and proceed with inference. If outliers had been found, we would want to fit model with & without outliers to compare the inferences. If the variance was not constant, we would need to transform the data before going ahead with inference. If the data is not normal, we might be able to transform it so that it is normal. Also, if data is not normal, we could use nonparametric methods for inference (not yet discussed in course).