Presentation is loading. Please wait.

Presentation is loading. Please wait.

KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.

Similar presentations


Presentation on theme: "KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220."— Presentation transcript:

1 KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220

2 Diagnostics for the Predictor Variable  Dot Plots  Sequence Plots  Stem-and-Leaf Plots Essentially to check for outlying observations which will be useful in later diagnosis.

3 Residual Analysis Why Look at the Residuals?  Detect non-linearity of regression function  Detect Heteroscedasticity (=lack of constant variance)  Auto-correlation  Outliers  Non-normality  Important predictor variables left out? Regression Model Assumptions:  Errors are Independent (Have Zero Covariance)  Errors have Constant Variance  Errors are Normally Distributed

4 Diagnostics for Residuals  Detect non-linearity of regression function  Heteroscedasticity  Auto-correlation  Outliers  Non-normality  Important predictor variables left out? 1.against predictor (if X 1 only) 2.(Absolute or Sqd. Residual) against predictor 3.against fitted values (for many X i ) 4.against time 5. against omitted predictor variables 6.Box plot 7.Normal probability plot PLOT OF RESIDUALS

5 Diagnostics for Residuals Approximate expected value of k th smallest residual : Normal probability plot

6 Tests involving Residuals The Correlation test for Normality H 0 : The residuals are normal H 0 : The residuals are normal H A : The residuals are not normal H A : The residuals are not normal  Correlation between e i (s) and their expected values under normality.  Use Table B.6  Observed coeff. of correlation should be at least as large as table value for a given level of significance.

7 Tests involving Residuals Other tests for Normality H 0 : The residuals are normal H 0 : The residuals are normal H A : The residuals are not normal H A : The residuals are not normal  Anderson-Darling (very powerful, may be used for small sets, n<25)  Ryan-Joiner  Shapiro-Wilk  Kolmogorov-Smirov

8 Tests involving Residuals The Correlation test for Normality H 0 : The residuals are normal H 0 : The residuals are normal H A : The residuals are not normal H A : The residuals are not normal  Correlation between e i (s) and their expected values under normality.  Use Table B.6  Observed coeff. of correlation should be at least as large as table value for a given level of significance.

9 Tests involving Residuals (Constancy of Error Variance) The Modified Levene Test  Partitions the independent variable into two groups (High X values and low X values), then tests the null (High X values and low X values), then tests the null H 0 : The groups have equal variances  Similar to a pooled variance t-test for difference in two means of independent samples.  It is robust to departures from normality or error terms  Large sample size essential so that dependencies of error terms on each other can be neglected  Uses group “median” instead of the “mean”(Why ?)

10 Tests involving Residuals (Constancy of Error Variance) The Modified Levine Test Read “Comments” on page 118 and go thru’ the Breusch-Pagan test on page 119.

11 F test for Lack of Fit  A comparison of “Full Model” sum of squares error and “Lack of Fit” sum of squares.  For best results, requires repeat observations at, at least one X level.  Full model: Y ij =  j +  ij (  j = mean response when X=X j )  Reduced model: Y ij =  0 +   X j +  ij (Why “Reduced” ?)

12 F test for Lack of Fit  SSE(Full)=SSPE= (Labeled “Pure Error” since unbiased estimator of true error variance. See 3.31 and 3.32, page 123)  SSLF=SSE(Reduced)-SSPE, (where SSE(Reduced)= SSE from ordinary least squares regression model)  Test Statistic : (what is “p”?) Be sure to compare the ANOVA table on page 126 with OLS ANOVA table.

13 Overview of some Remedial Measures  The Problem: Simple Linear Regression is not appropriate.  The solution: 1. Abandon the model (“Eagle to Hawk; abort mission and return to base”.) 2. Remedy the situation: If Non-independent error terms then work with a model that calls for correlated error terms (Ch.12) If Heteroscedasticity then use WLS method to estimate parameters (Ch. 10) or use transformations of data. If scatter plot indicates non-linearity, then either use non-linear regression function (Ch.7) or transform to linear. NEXT: We will look at one such powerful transformation method.

14 The Box-Cox Transformation Method  The family of power transforms on Y is given as: Y'=Y  The family easily includes simple transforms such as the square root, squared etc. definition  By definition, when  then Y'=log e Y  When the response variable is so transformed, the normal error regression model becomes: Y i  0 +   X i +  i  We would like to determine the “best” value of   ethod 1: Maximum likelihood estimation

15 The Box-Cox Transformation Method   ethod 2: Numerical Search Step 1: Set a value of. Step 2: Standardize the Y i observations If then: W i =K 1 (Y i  ) If then: W i =K 2 (log e Y i ) where, K 2 and K 1 Step 3: Now regress the set W on the set X. Step 4: Note the corresponding SSE. Step 5: Change  and repeat steps 2 to 4 until lowest SSE is obtained. Let’s try both this method with the GMAT data. What should we get as the best 


Download ppt "KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220."

Similar presentations


Ads by Google