Download presentation
Presentation is loading. Please wait.
Published byStella Paul Modified over 9 years ago
1
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220
2
Diagnostics for the Predictor Variable Dot Plots Sequence Plots Stem-and-Leaf Plots Essentially to check for outlying observations which will be useful in later diagnosis.
3
Residual Analysis Why Look at the Residuals? Detect non-linearity of regression function Detect Heteroscedasticity (=lack of constant variance) Auto-correlation Outliers Non-normality Important predictor variables left out? Regression Model Assumptions: Errors are Independent (Have Zero Covariance) Errors have Constant Variance Errors are Normally Distributed
4
Diagnostics for Residuals Detect non-linearity of regression function Heteroscedasticity Auto-correlation Outliers Non-normality Important predictor variables left out? 1.against predictor (if X 1 only) 2.(Absolute or Sqd. Residual) against predictor 3.against fitted values (for many X i ) 4.against time 5. against omitted predictor variables 6.Box plot 7.Normal probability plot PLOT OF RESIDUALS
5
Diagnostics for Residuals Approximate expected value of k th smallest residual : Normal probability plot
6
Tests involving Residuals The Correlation test for Normality H 0 : The residuals are normal H 0 : The residuals are normal H A : The residuals are not normal H A : The residuals are not normal Correlation between e i (s) and their expected values under normality. Use Table B.6 Observed coeff. of correlation should be at least as large as table value for a given level of significance.
7
Tests involving Residuals Other tests for Normality H 0 : The residuals are normal H 0 : The residuals are normal H A : The residuals are not normal H A : The residuals are not normal Anderson-Darling (very powerful, may be used for small sets, n<25) Ryan-Joiner Shapiro-Wilk Kolmogorov-Smirov
8
Tests involving Residuals The Correlation test for Normality H 0 : The residuals are normal H 0 : The residuals are normal H A : The residuals are not normal H A : The residuals are not normal Correlation between e i (s) and their expected values under normality. Use Table B.6 Observed coeff. of correlation should be at least as large as table value for a given level of significance.
9
Tests involving Residuals (Constancy of Error Variance) The Modified Levene Test Partitions the independent variable into two groups (High X values and low X values), then tests the null (High X values and low X values), then tests the null H 0 : The groups have equal variances Similar to a pooled variance t-test for difference in two means of independent samples. It is robust to departures from normality or error terms Large sample size essential so that dependencies of error terms on each other can be neglected Uses group “median” instead of the “mean”(Why ?)
10
Tests involving Residuals (Constancy of Error Variance) The Modified Levine Test Read “Comments” on page 118 and go thru’ the Breusch-Pagan test on page 119.
11
F test for Lack of Fit A comparison of “Full Model” sum of squares error and “Lack of Fit” sum of squares. For best results, requires repeat observations at, at least one X level. Full model: Y ij = j + ij ( j = mean response when X=X j ) Reduced model: Y ij = 0 + X j + ij (Why “Reduced” ?)
12
F test for Lack of Fit SSE(Full)=SSPE= (Labeled “Pure Error” since unbiased estimator of true error variance. See 3.31 and 3.32, page 123) SSLF=SSE(Reduced)-SSPE, (where SSE(Reduced)= SSE from ordinary least squares regression model) Test Statistic : (what is “p”?) Be sure to compare the ANOVA table on page 126 with OLS ANOVA table.
13
Overview of some Remedial Measures The Problem: Simple Linear Regression is not appropriate. The solution: 1. Abandon the model (“Eagle to Hawk; abort mission and return to base”.) 2. Remedy the situation: If Non-independent error terms then work with a model that calls for correlated error terms (Ch.12) If Heteroscedasticity then use WLS method to estimate parameters (Ch. 10) or use transformations of data. If scatter plot indicates non-linearity, then either use non-linear regression function (Ch.7) or transform to linear. NEXT: We will look at one such powerful transformation method.
14
The Box-Cox Transformation Method The family of power transforms on Y is given as: Y'=Y The family easily includes simple transforms such as the square root, squared etc. definition By definition, when then Y'=log e Y When the response variable is so transformed, the normal error regression model becomes: Y i 0 + X i + i We would like to determine the “best” value of ethod 1: Maximum likelihood estimation
15
The Box-Cox Transformation Method ethod 2: Numerical Search Step 1: Set a value of. Step 2: Standardize the Y i observations If then: W i =K 1 (Y i ) If then: W i =K 2 (log e Y i ) where, K 2 and K 1 Step 3: Now regress the set W on the set X. Step 4: Note the corresponding SSE. Step 5: Change and repeat steps 2 to 4 until lowest SSE is obtained. Let’s try both this method with the GMAT data. What should we get as the best
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.