Regression Analysis Part D Model Building Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied Approach. L01D MGS 8110 - Regression - Model Building
Regression Analysis Modules Part A – Basic Model & Parameter Estimation Part B – Calculation Procedures Part C – Inference: Confidence Intervals & Hypothesis Testing Part D – Goodness of Fit Part E – Model Building Part F – Transformed Variables Part G – Standardized Variables Part H – Dummy Variables Part I – Eliminating Intercept Part J - Outliers Part K – Regression Example #1 Part L – Regression Example #2 Part N – Non-linear Regression Part P – Non-linear Example R L01D MGS 8110 - Regression - Goodness of Fit L01C MGS 8110 - Regression Inference
Overview of Goodness of Fit Standard Error Prediction Interval Validation C Statistic R2adjusted, Adjusted Coefficient of Determination L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit Primary Measures R2, Coefficient of Determination se, Standard Error of Regression Prediction interval Validation of Fit C Statistic Secondary Measures R2adjusted, Adjusted Coefficient of Determination R, correlation between observed and predicted. Press Statistic L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: R2 – Coefficient of Determination Varies between 0 and 1. “Is the proportion of the variability in the dependent variable that is accounted for by the regression equation.” Calculated as L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: se – Standard Error of Regression se is the standard deviation of the residuals. 66%, 95% and 99.7% of the residuals will be between plus or minus 1, 2 and 3 se. Calculated as se does not necessarily get smaller as n gets larger. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Comparison R2 and se L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Prediction Interval Calculated as Includes se and is a more encompassing measure of goodness of fit than just se. More difficult to calculate than se and not a unique value (there is a prediction interval for every possible Xf).. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: C Statistic Manually calculated in Excel & SPSS where k=p+1, p = # variables in the transformed database Model Selection Criterion: C <= k then select smallest C value. Poor model if C > k. A less asymptotic measure of fit than R2. Definitely takes into consideration the number of variables in the model. Does not consider the intrinsic characteristics of the variables. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit (1 of 5) Collect a sample of data and fit a regression equation to the data. Collect a second, comparable sample of data and see how well the previously derived regression equation agrees with this data. Do not actually fit a regression equation to the second set of data. Manually calculate the R2 (and se)for the second data set. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit (2 of 5) Calculate the R2 of the second sample as And see if this R2 is almost as good as the original R2. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit (3 of 5) The two alternative formulas for R2 will give identical results for the original sample but different results for the validation sample. An infeasible R2 (less than 0 or greater than 1) may be obtained for the validation sample if the second formula is used. More likely to occur if the sample size for the validation sample is very small. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit (4 of 5) The sum of squares forms (shown on the right) of the deviation squared formulas are also not valid for the validation data base. USE Do NOT use Do NOT use Do NOT use L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit (5 of 5) Likewise, the matrix form of the sum-of-square formulas should NOT be used. Do NOT use Do NOT use None of the matrix formulations can be used. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit – Numerical Example (1 of 5) L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit – Numerical Example (2 of 5) Using Algebraic Formulas on Original Data L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit – Numerical Example (3 of 5) Using Matrix Formulas on Original Data Same Values. So same R-sq’s L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit – Numerical Example (4 of 5) Using Algebraic Formulas on Verification Data L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Validation of Fit – Numerical Example (5 of 5) Using Matrix Formulas on Verification Data L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
A revised method of calculating a R2 “type” of metric. Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination A revised method of calculating a R2 “type” of metric. The advantage of this metric is that the R2 will decrease when a variable of marginal value is added to a regression equation. The disadvantage of this metric is that it does not have a meaningful physical interpretation. In particular, R2adjusted does NOT represent the percentage of the total variation that is explained by the regression equation. A second disadvantage is the decrease in the R2adjusted value is very small and there are no decision rules to interpret the reduction in the R2 value. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Alternative calculation procedures Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination Alternative calculation procedures L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
L01D MGS 8110 - Regression - Model Building Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
L01D MGS 8110 - Regression - Model Building Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
L01D MGS 8110 - Regression - Model Building Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: R, Multiple Correlation Coefficient Correlation between the observed y’s and predicted y’s. Calculated as L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building
Goodness of Fit: Press Statistic Not calculated in Excel & SPSS and very tedious to calculate manually. Does not provide a traditional measure of Goodness of Fit. Rather, provides a metric as to whether a model may be effective for predictions at extreme points in the database. an indication of which data points could be considered extreme points in the database. L01D MGS 8110 - Regression - Goodness of Fit L01D MGS 8110 - Regression - Model Building