OLS REGRESSION VS. NEURAL NETWORKS VS. MARS A COMPARISON R. J. Lievano E. Kyper University of Minnesota Duluth.

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

EE 690 Design of Embodied Intelligence
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Principle Components & Neural Networks How I finished second in Mapping Dark Matter Challenge Sergey Yurgenson, Harvard University Pasadena, 2011.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Linear Regression
Ch11 Curve Fitting Dr. Deshi Ye
Lecture 14 – Neural Networks
The Nature of Statistical Learning Theory by V. Vapnik
Decision Support Systems
Chapter 5 NEURAL NETWORKS
Neural Networks Marco Loog.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.
October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Introduction to Directed Data Mining: Neural Networks
Decision Tree Models in Data Mining
Last lecture summary.
Ranga Rodrigo April 5, 2014 Most of the sides are from the Matlab tutorial. 1.
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Multi-Layer Perceptrons Michael J. Watts
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Combining Regression Trees and Radial Basis Function Networks paper by: M. Orr, J. Hallam, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, T. Leonard presentation.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
5.2 Input Selection 5.3 Stopped Training
NEURAL NETWORKS FOR DATA MINING
Classification / Regression Neural Networks 2
A comparison of the ability of artificial neural network and polynomial fitting was carried out in order to model the horizontal deformation field. It.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Chapter 16 Data Analysis: Testing for Associations.
Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Artificial Neural Networks for Data Mining. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2 Learning Objectives Understand the.
Computacion Inteligente Least-Square Methods for System Identification.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Canadian Bioinformatics Workshops
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Deep Feedforward Networks
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Lecture 3: Linear Regression (with One Variable)
第 3 章 神经网络.
Prof. Carolina Ruiz Department of Computer Science
Chapter 6: Multiple Linear Regression
Artificial Intelligence Methods
Artificial Intelligence Chapter 3 Neural Networks
network of simple neuron-like computing elements
Linear Model Selection and regularization
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Regression Analysis.
Artificial Intelligence Chapter 3 Neural Networks
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

OLS REGRESSION VS. NEURAL NETWORKS VS. MARS A COMPARISON R. J. Lievano E. Kyper University of Minnesota Duluth

Research Questions Are new data mining regression techniques superior to classical regression? Can data analysis methods implemented naively (through default automated routines) yield useful results consistently?

Assessment of 3x2 3 factorial experiment Regression methods (3): OLS forward stepwise regression, feedforward neural networks, Multivariate Adaptive Regression Splines (MARS). Type of function (2): linear and nonlinear. Noise Size (2): small, large. Sample Size (2): small, large.

FORWARD STEPWISE REGRESSION Given a set of responses Y and predictors X such that Y = F(x) + ε where ε is an error (noise) structure: Find a subset X R of X which satisfies a set of conditions such as goodness-of-fit or simplicity. Fit a set of successive models of the type Y i = Σ j β j X j + ε i Stop when a specified criterion has been achieved. e.g. Maximum adjusted R 2 No remaining significant predictors

MULTIVARIATE ADAPTIVE REGRESSION SPLINES (MARS) Given a set of responses Y and predictors X such that Y = F(x) + ε where ε is an error (noise) structure: Find a set of basis functions W j (spline transformations of X j ) which describe intervals of varying relationships between X j and Y Fit these basis functions with a stepwise regression procedure to models of the type until a stopping criterion has been achieved.

Input (I)Output (y) x1x2x3x4x5x0x1x2x3x4x5x I=0.8+.3x1+.7x2-.2x3+.4x4-.5x5  (I) 0.8 A Neuron Sigmoidal Activation (transfer) Function NEURAL NETWORKS COMPONENTS To next Layer

Input from hidden node Output Overall (many Nodes) The resulting model is just a flexible non-linear regression of the response on a set of predictor variables. Input LayerHidden LayerOutput Layer

Hypothesis H1: The three methods are equivalent in accuracy (goodness-of-fit). H2: The three methods are equivalent in ability to select valid predictors. –H2a: The three methods are equivalent in the degree of underfitting. –H2b: The three methods are equivalent in the degree of overfitting.

A SLICE OF Y = α + Σ j β j X j + ε (Linear functional form modeled)

A SLICE OF Y = α + Σ j LOG e (β j X j ) + ε (Nonlinear functional form modeled

ANOVA RESULTS: METHOD MEANS AND 0.95 INTERVALS

Results/Conclusions H1 can be rejected (three methods are not equivalent in accuracy). H2a can not be rejected, underfitting is more prevalent in nonlinear fits with large noise for smaller samples. H2b can be rejected (three methods are equivalent in degree of overfitting).

Results Cont. Linear PMSE: OLS regression Linear over spec.: MARS Nonlinear PMSE: NNW Nonlinear over spec.: MARS Need further study to answer research questions clearly.

Further Research Conducted Kept the same three methods with only large samples. Kept function as a factor but changed from two to three functions (1 linear, 2 nonlinear) Replaced noise with contamination (contaminated and uncontaminated data) Found that OLS regression performed best in all linear cases. Unlike previous findings we now found that MARS performed the best in all nonlinear cases and that underspecfication is now significant.