Download presentation
Presentation is loading. Please wait.
1
Regression Analysis Part D Model Building
Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied Approach. L01D MGS Regression - Model Building
2
Regression Analysis Modules
Part A – Basic Model & Parameter Estimation Part B – Calculation Procedures Part C – Inference: Confidence Intervals & Hypothesis Testing Part D – Goodness of Fit Part E – Model Building Part F – Transformed Variables Part G – Standardized Variables Part H – Dummy Variables Part I – Eliminating Intercept Part J - Outliers Part K – Regression Example #1 Part L – Regression Example #2 Part N – Non-linear Regression Part P – Non-linear Example R L01D MGS Regression - Goodness of Fit L01C MGS Regression Inference
3
Overview of Goodness of Fit
Standard Error Prediction Interval Validation C Statistic R2adjusted, Adjusted Coefficient of Determination L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
4
Goodness of Fit Primary Measures R2, Coefficient of Determination
se, Standard Error of Regression Prediction interval Validation of Fit C Statistic Secondary Measures R2adjusted, Adjusted Coefficient of Determination R, correlation between observed and predicted. Press Statistic L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
5
Goodness of Fit: R2 – Coefficient of Determination
Varies between 0 and 1. “Is the proportion of the variability in the dependent variable that is accounted for by the regression equation.” Calculated as L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
6
Goodness of Fit: se – Standard Error of Regression
se is the standard deviation of the residuals. 66%, 95% and 99.7% of the residuals will be between plus or minus 1, 2 and 3 se. Calculated as se does not necessarily get smaller as n gets larger. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
7
Goodness of Fit: Comparison R2 and se
L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
8
Goodness of Fit: Prediction Interval
Calculated as Includes se and is a more encompassing measure of goodness of fit than just se. More difficult to calculate than se and not a unique value (there is a prediction interval for every possible Xf).. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
9
Goodness of Fit: C Statistic
Manually calculated in Excel & SPSS where k=p+1, p = # variables in the transformed database Model Selection Criterion: C <= k then select smallest C value. Poor model if C > k. A less asymptotic measure of fit than R2. Definitely takes into consideration the number of variables in the model. Does not consider the intrinsic characteristics of the variables. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
10
Goodness of Fit: Validation of Fit (1 of 5)
Collect a sample of data and fit a regression equation to the data. Collect a second, comparable sample of data and see how well the previously derived regression equation agrees with this data. Do not actually fit a regression equation to the second set of data. Manually calculate the R2 (and se)for the second data set. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
11
Goodness of Fit: Validation of Fit (2 of 5)
Calculate the R2 of the second sample as And see if this R2 is almost as good as the original R2. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
12
Goodness of Fit: Validation of Fit (3 of 5)
The two alternative formulas for R2 will give identical results for the original sample but different results for the validation sample. An infeasible R2 (less than 0 or greater than 1) may be obtained for the validation sample if the second formula is used. More likely to occur if the sample size for the validation sample is very small. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
13
Goodness of Fit: Validation of Fit (4 of 5)
The sum of squares forms (shown on the right) of the deviation squared formulas are also not valid for the validation data base. USE Do NOT use Do NOT use Do NOT use L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
14
Goodness of Fit: Validation of Fit (5 of 5)
Likewise, the matrix form of the sum-of-square formulas should NOT be used. Do NOT use Do NOT use None of the matrix formulations can be used. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
15
Goodness of Fit: Validation of Fit – Numerical Example (1 of 5)
L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
16
Goodness of Fit: Validation of Fit – Numerical Example (2 of 5)
Using Algebraic Formulas on Original Data L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
17
Goodness of Fit: Validation of Fit – Numerical Example (3 of 5)
Using Matrix Formulas on Original Data Same Values. So same R-sq’s L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
18
Goodness of Fit: Validation of Fit – Numerical Example (4 of 5)
Using Algebraic Formulas on Verification Data L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
19
Goodness of Fit: Validation of Fit – Numerical Example (5 of 5)
Using Matrix Formulas on Verification Data L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
20
A revised method of calculating a R2 “type” of metric.
Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination A revised method of calculating a R2 “type” of metric. The advantage of this metric is that the R2 will decrease when a variable of marginal value is added to a regression equation. The disadvantage of this metric is that it does not have a meaningful physical interpretation. In particular, R2adjusted does NOT represent the percentage of the total variation that is explained by the regression equation. A second disadvantage is the decrease in the R2adjusted value is very small and there are no decision rules to interpret the reduction in the R2 value. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
21
Alternative calculation procedures
Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination Alternative calculation procedures L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
22
L01D MGS 8110 - Regression - Model Building
Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
23
L01D MGS 8110 - Regression - Model Building
Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
24
L01D MGS 8110 - Regression - Model Building
Goodness of Fit: R2a – Adjusted (Corrected) Coefficient of Determination L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
25
Goodness of Fit: R, Multiple Correlation Coefficient
Correlation between the observed y’s and predicted y’s. Calculated as L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
26
Goodness of Fit: Press Statistic
Not calculated in Excel & SPSS and very tedious to calculate manually. Does not provide a traditional measure of Goodness of Fit. Rather, provides a metric as to whether a model may be effective for predictions at extreme points in the database. an indication of which data points could be considered extreme points in the database. L01D MGS Regression - Goodness of Fit L01D MGS Regression - Model Building
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.