Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU
Presentation 5 Building a Predictive Regression Model using Data Mining Techniques (Backward, Forward, Stepwise, All Regressions, Cp Statistic) Chapter 6 in SPB
Building a Data Mining Model: Example with Linear Regression – The Boston Housing Data Set Selecting a Best Subset Regression Selection Methods: Forward, Backward, Stepwise, and All Subsets Methods. See the file “Multiple Linear Regression and Subset Selection.pdf” on the class website. The critical p-value as a “tuning parameter” Thus the best “architecture” of the regression equation is determined by the choice of best subset selection method as well as the choice of the p-value tuning parameter of the respective subset selection methods
Building a Model: Example with Linear Regression – The Boston Housing Data Set continued Demonstrate the danger of over-fitting a multiple regression equation by running the SAS simulation program “Cross” The Validation data set reveals the true bogus nature of “spurious” regressions
Evaluation of Predictive Performance I.Various Measures of Predictive Accuracy II.Comparisons of Predictive Accuracies across competing prediction models in the validation data set III.Is the Predictive Accuracy of one method statistically better than the Predictive Accuracy of a competing method?
Now go directly to the file “Scoring Measures for Prediction Problems.pdf” on the class website
Building a Model: Example with Linear Regression – The Boston Housing Data Set continued Rule of Thumb Adjustment of p-values in linear regressions arising from multiple testing vis-à-vis subset selection methods. See the paper on Data Mining by Michael C. Lovell I have posted on the class website in the file Data Mining_Lovell.pdf and, in particular, equation (3).
Building a Model: Example with Linear Regression – The Boston Housing Data Set continued Demonstrate the Best Subset Selection Methods on the Boston Housing Data Set using the SAS program “Boston Best Subset” that can be found on the course website Demonstrate Best Subset Selection Procedure in SPSS Modeler
Classroom Exercise: Exercise 3