Download presentation
Presentation is loading. Please wait.
Published byIsabella Summers Modified over 9 years ago
1
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues from data collected from 2007 Q1 to 2013 Q2. Goal: Find an equation (model) that explains variation in Y with a smaller set of predictors that are all related to Y but not too related to each other (multicollinearity). Predict revenues for 2013 Q3 to 2014 Q2. Your dependent variable will be revenues or seasonally adjusted revenues depending upon whether your data has pronounced seasonality.
2
Forecasting Revenue: An Example of Regression Model Building Hold out sample for validation process later. Do not use 2013 Q1 and Q2 until after you have done the validation process. Starting Point: Examine multicollinearity by checking correlations with a correlation matrix and by generating VIF values. This allows you some choice in which to choose variables that have better forecasts available or that you believe should be most related to revenues in theory.
3
Variance Inflation Factors Variance Inflation Factor (VIF) – Measure of how highly correlated each independent variable is with the other predictors in the model. Used to identify Multicollinearity. Values larger than 10 for a predictor imply large inflation of standard errors of regression coefficients due to this variable being in model. Inflated standard errors lead to insignificant t- statistics for regression coefficients and wider confidence intervals
4
Forecasting Revenue: An Example of Regression Model Building Run a multiple regression to look at VIF values (and D-W values) – Delete one of the variables from those that with VIF > 10. Choose the one that has the highest VIF or another variable with high VIF that may not have forecasts available. There is some flexibility in this step. Repeat until all VIF are smaller than 10. This will result in a reduced set of variables to use in finding an equation using All Possible Regressions.
5
Forecasting Revenue: An Example of Regression Model Building Best Model Process using the data from Q1 2007 to Q2 2013. Use MegaStat All Possible Regressions to find an equation that has all significant (p-value <.05) variables and has a small standard error (large adjusted R-squared). The C p Statistic summarizes each possible model, where “best” model can be selected based on the statistic. Ideally you s elect the model with the fewest predictors that has C p p and has p-values <.05 for all variables.
6
Validating Your Model Validation with holdout sample. Forecast Q1 and Q2 2013 with 95% prediction intervals. Do the actual values fall within the lower and upper prediction limits implying that the predictions seem reasonable? If so use all 26 quarters and redo the equation using the same variables and forecast 2013 Q3 to 2014 Q2. Check the assumptions for the validation model. If not, try using an alternative model from the all possible regressions options or see if there is a reason that quarters 1 and/or 2 in 2013 are different in some way. Look at the quarterly reports and see if they might suggest use of a dummy variable. Redo the validation process.
7
Regression Diagnostics Model Assumptions: Residual plots or other diagnostics can be used to check the assumptions -- Plot of Residuals versus each variable should be random cloud U-shaped (or rainbow) Nonlinear relationship -- Plot of Residuals versus predicted should be random cloud Wedge shaped Non-constant (increasing) variability -- Residuals should be mound-shaped (normal). Use skewness/kurtosis or a normal probability plot to check. -- Plot of Residuals versus Time order (Time series data) should be random cloud. If D-W < 1.3, residuals are not independent. Cook’s D is a check for influential observations that may have large impacts on the equation. Check data for accuracy.
8
Detecting Influential Observations Studentized Residuals – Residuals divided by their estimated standard errors. Observations in dark blue are considered outliers from the equation. Leverage Values – Measure of how far an observation is from the others in terms of the levels of the independent variables (not the dependent variable). Observations in dark blue are considered to be outliers in the X values. Cook’s D – Measure of aggregate impact of each observation on the group of regression coefficients, as well as the group of fitted values. Values larger than 1 are considered highly influential. Influential observations may suggest quarters to research to see if something special happened that may suggest a dummy variable.
9
The Final Forecasts Add 2013 Q1 and Q2 back in your data set and redo the equation using the same variables and forecast 2013 Q3 to 2014 Q2. Recheck the assumptions now that you have 2 additional data points. Do the forecasts make sense? You may have actual Q3 revenues to compare your forecast with. Superimpose your forecasts on a time series plot of revenues and ensure that the forecasts seem reasonable. Document all your data and forecast sources. Write a report that documents all aspects of the forecasting process.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.