Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Topic 9: Remedies.
Topic 32: Two-Way Mixed Effects Model. Outline Two-way mixed models Three-way mixed models.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
EPI 809/Spring Probability Distribution of Random Error.
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Multiple regression analysis
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Chapter 12 Multiple Regression
Simple Linear Regression Analysis
Regression Diagnostics Checking Assumptions and Data.
Chapter 7 Forecasting with Simple Regression
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)
5-5 Inference on the Ratio of Variances of Two Normal Populations The F Distribution We wish to test the hypotheses: The development of a test procedure.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
AOV Assumption Checking and Transformations (§ )
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Topic 30: Random Effects. Outline One-way random effects model –Data –Model –Inference.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
ANOVA: Graphical. Cereal Example: nknw677.sas Y = number of cases of cereal sold (CASES) X = design of the cereal package (PKGDES) r = 4 (there were 4.
Lecture 10: Correlation and Regression Model.
Kruskal-Wallis H TestThe Kruskal-Wallis H Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized.
Residual Analysis for ANOVA Models KNNL – Chapter 18.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Methods and Applications CHAPTER 15 ANOVA : Testing for Differences among Many Samples, and Much.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.
Experimental Statistics - week 9
Topic 29: Three-Way ANOVA. Outline Three-way ANOVA –Data –Model –Inference.
Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.
Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 8:16-17.
Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA
Statistics for Managers using Microsoft Excel 3rd Edition
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Topic 31: Two-way Random Effects Models
CHAPTER 29: Multiple Regression*
Joanna Romaniuk Quanticate, Warsaw, Poland
Simple Linear Regression
Diagnostics and Remedial Measures
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Diagnostics and Remedial Measures
Presentation transcript:

Topic 23: Diagnostics and Remedies

Outline Diagnostics –residual checks ANOVA remedial measures

Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and adapt them to the ANOVA setting Many things are essentially the same Some things require modification

Residuals Predicted values are cell means, = Residuals are the differences between the observed values and the cell means Y ij -

Basic plots Plot the data vs the factor levels (the values of the explanatory variables) Plot the residuals vs the factor levels Construct a normal quantile plot and/or histogram of the residuals

KNNL Example KNNL p 777 Compare 4 brands of rust inhibitor (X has r=4 levels) Response variable is a measure of the effectiveness of the inhibitor There are 10 units per brand (n=10)

Plots Data versus the factor Residuals versus the factor Normal quantile plot of the residuals

Plots vs the factor symbol1 v=circle i=none; proc gplot data=a2; plot (eff resid)*abrand; run;

Data vs the factor Means look different …common spread in Y’s

Residuals vs the factor Odd dist of points

QQ-plot Due to odd (lack of and large)spread Can try nonparametric analysis – last slides

General Summary Look for –Outliers –Variance that depends on level –Non-normal errors Plot residuals vs time and other variables if available

Homogeneity tests Homogeneity of variance (homoscedasticity) H 0 : σ 1 2 = σ 2 2 = … = σ r 2 H 1 : not all σ i 2 are equal Several significance tests are available

Homogeneity tests Text discusses Hartley, modified Levene SAS has several including Bartlett’s (essentially the likelihood ratio test) and several versions of Levene

Homogeneity tests There is a problem with assumptions –ANOVA is robust with respect to moderate deviations from Normality –ANOVA results can be sensitive to the homogeneity of variance assumption Some homogeneity tests are sensitive to the Normality assumption

Levene’s Test Do ANOVA on the squared residuals from the original ANOVA Modified Levene’s test uses absolute values of the residuals Modified Levene’s test is recommended Another quick and dirty rule of thumb

KNNL Example KNNL p 785 Compare the strengths of 5 types of solder flux (X has r=5 levels) Response variable is the pull strength, force in pounds required to break the joint There are 8 solder joints per flux (n=8)

Scatterplot

Levene’s Test proc glm data=a1; class type; model strength=type; means type/ hovtest=levene(type=abs); run;

ANOVA Table SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total Common variance estimated to be 2.11

Output Levene's Test ANOVA of Absolute Deviations Source DF F Value Pr > F type Error 35 We reject the null hypothesis and assume nonconstant variance

Means and SDs Level strength type N Mean Std Dev

Remedies Delete outliers – Is their removal important? Use weights (weighted regression) Transformations Nonparametric procedures

What to do here? Not really any obvious outliers Do not see pattern of increasing or decreasing variance or skewed dists Will consider –Weighted ANOVA –Mixed model ANOVA

Weighted least squares We used this with regression –Obtain model for how the sd depends on the explanatory variable (plotted absolute value of residual vs x) –Then used weights inversely proportional to the estimated variance

Weighted Least Squares Here we can compute the variance for each level Use these as weights in PROC GLM We will illustrate with the soldering example from KNNL

Obtain the variances and weights proc means data=a1; var strength; by type; output out=a2 var=s2; data a2; set a2; wt=1/s2; NOTE. Data set a2 has 5 cases

Proc Means Output Level of typeN strength MeanStd Dev

Merge and then use the weights in PROC GLM data a3; merge a1 a2; by type; proc glm data=a3; class type; model strength=type; weight wt; lsmeans type / cl; run;

Output SourceDF Sum of SquaresMean SquareF ValuePr > F Model <.0001 Error Corrected Total Data have been standardized to have a variance of 1

LSMEANS Output type strength LSMEAN Standard ErrorPr > |t| 95% Confidence Limits < < < < < Because of weights, standard errors simply based on sample variances of each level

Mixed Model ANOVA Relax the assumption of constant variance rather than including a “known” weight This involves moving to a mixed model procedure Topic will not be on exam but wanted you to be aware of these model capabilities

SAS Code proc glimmix data=a1; class type; model strength=type / ddfm=kr; random residual / group=type; run; This allows the variance to differ in each level and a degrees of freedom adjustment is used to account for this

GLIMMIX OUTPUT Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) CAIC (smaller is better) HQIC (smaller is better) Generalized Chi-Square35.00 Gener. Chi-Square / DF1.00 Covariance Parameter Estimates Cov ParmGroupEstimate Standard Error Residual (VC)type Residual (VC)type Residual (VC)type Residual (VC)type Residual (VC)type Type III Tests of Fixed Effects Effect Num DF Den DFF ValuePr > F type <.0001 Really 3 groups of variances

SAS Code proc glimmix data=a1; class type; model strength=type / ddfm=kr; random residual / group=type1; run; Type1 was created to identify Type 1 and 2, Type 3, and Type 4 and 5 as 3 groups

GLIMMIX OUTPUT Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) CAIC (smaller is better) HQIC (smaller is better) Generalized Chi-Square35.00 Gener. Chi-Square / DF1.00 Covariance Parameter Estimates Cov ParmGroupEstimate Standard Error Residual (VC)Grp Residual (VC)Grp Residual (VC)Grp Type III Tests of Fixed Effects Effect Num DF Den DFF ValuePr > F type <.0001 Better BIC but same general type conclusion

Transformation Guides When σ i 2 is proportional to μ i, use When σ i is proportional to μ i, use log(y) When σ i is proportional to μ i 2, use 1/y For proportions, use arcsin( ) –arsin(sqrt(y)) in a SAS data step Box-Cox transformation

Example Consider study on KNNL pg 790 Y: time between computer failures X: three locations data a3; infile 'u:\.www\datasets512\CH18TA05.txt'; input time location interval; symbol1 v=circle; proc gplot; plot time*location; run;

Scatterplot Outlier or skewed distribution? Can consider transformation first

Box-Cox Transformation Can consider regression and 1-b 1 is the power to raise Y by Can try various “convenient” powers Can use SAS directly to calculate the power

E(logsig) = logmu Power should be ≈ 0.20

Using SAS proc transreg data=a3; model boxcox(time / lambda=-2 to 2 by.2) = class(location); run;

Output Box-Cox Transformation Information for time LambdaR-SquareLog Like * * < * < - Best Lambda * - 95% Confidence Interval + - Convenient Lambda

Transforming data in SAS data a3; set a3; transtime = time**0.20; symbol1 v=circle i=none; proc gplot; plot transtime*location; run;

Much more constant spread in data!

Nonparametric approach Based on ranks See KNNL section 18.7, p 795 See the SAS procedure NPAR1WAY

Rust Inhibitor Analysis SourceDF Sum of SquaresMean SquareF ValuePr > F Model <.0001 Error Corrected Total Highly significant F test. Even if there is a violation of Normality, the evidence is overwhelming

Nonparametric Analysis Wilcoxon Scores (Rank Sums) for Variable eff Classified by Variable abrand abrandN Sum of Scores Expected Under H0 Std Dev Under H0 Mean Score Average scores were used for ties. Kruskal-Wallis Test Chi-Square DF3 Pr > Chi-Square<.0001

Last slide We’ve finished most of Chapters 17 and 18. We used program topic23.sas to generate the output.