OLS REGRESSION VS. NEURAL NETWORKS VS. MARS A COMPARISON R. J. Lievano E. Kyper University of Minnesota Duluth
Research Questions Are new data mining regression techniques superior to classical regression? Can data analysis methods implemented naively (through default automated routines) yield useful results consistently?
Assessment of 3x2 3 factorial experiment Regression methods (3): OLS forward stepwise regression, feedforward neural networks, Multivariate Adaptive Regression Splines (MARS). Type of function (2): linear and nonlinear. Noise Size (2): small, large. Sample Size (2): small, large.
FORWARD STEPWISE REGRESSION Given a set of responses Y and predictors X such that Y = F(x) + ε where ε is an error (noise) structure: Find a subset X R of X which satisfies a set of conditions such as goodness-of-fit or simplicity. Fit a set of successive models of the type Y i = Σ j β j X j + ε i Stop when a specified criterion has been achieved. e.g. Maximum adjusted R 2 No remaining significant predictors
MULTIVARIATE ADAPTIVE REGRESSION SPLINES (MARS) Given a set of responses Y and predictors X such that Y = F(x) + ε where ε is an error (noise) structure: Find a set of basis functions W j (spline transformations of X j ) which describe intervals of varying relationships between X j and Y Fit these basis functions with a stepwise regression procedure to models of the type until a stopping criterion has been achieved.
Input (I)Output (y) x1x2x3x4x5x0x1x2x3x4x5x I=0.8+.3x1+.7x2-.2x3+.4x4-.5x5 (I) 0.8 A Neuron Sigmoidal Activation (transfer) Function NEURAL NETWORKS COMPONENTS To next Layer
Input from hidden node Output Overall (many Nodes) The resulting model is just a flexible non-linear regression of the response on a set of predictor variables. Input LayerHidden LayerOutput Layer
Hypothesis H1: The three methods are equivalent in accuracy (goodness-of-fit). H2: The three methods are equivalent in ability to select valid predictors. –H2a: The three methods are equivalent in the degree of underfitting. –H2b: The three methods are equivalent in the degree of overfitting.
A SLICE OF Y = α + Σ j β j X j + ε (Linear functional form modeled)
A SLICE OF Y = α + Σ j LOG e (β j X j ) + ε (Nonlinear functional form modeled
ANOVA RESULTS: METHOD MEANS AND 0.95 INTERVALS
Results/Conclusions H1 can be rejected (three methods are not equivalent in accuracy). H2a can not be rejected, underfitting is more prevalent in nonlinear fits with large noise for smaller samples. H2b can be rejected (three methods are equivalent in degree of overfitting).
Results Cont. Linear PMSE: OLS regression Linear over spec.: MARS Nonlinear PMSE: NNW Nonlinear over spec.: MARS Need further study to answer research questions clearly.
Further Research Conducted Kept the same three methods with only large samples. Kept function as a factor but changed from two to three functions (1 linear, 2 nonlinear) Replaced noise with contamination (contaminated and uncontaminated data) Found that OLS regression performed best in all linear cases. Unlike previous findings we now found that MARS performed the best in all nonlinear cases and that underspecfication is now significant.