5.2 Input Selection 5.3 Stopped Training

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Neural networks Introduction Fitting neural networks
Perceptron Learning Rule
Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Probability & Statistical Inference Lecture 9
“I Don’t Need Enterprise Miner”
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Chapter 6: Model Assessment
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Artificial Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Data Mining Techniques Outline
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Additional Topics in Regression Analysis
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Chapter 15: Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Introduction to Directed Data Mining: Neural Networks
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Chapter 4: Predictive Modeling
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
Neural Networks Lecture 8: Two simple learning algorithms
Zhangxi Lin ISQS Texas Tech University Note: Most slides in this file are sourced from Course Notes Lecture Notes 8 Continuous and Multiple.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Chapter 2: Accessing and Assaying Prepared Data
COMP3503 Intro to Inductive Modeling
Classification Part 3: Artificial Neural Networks
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Multi-Layer Perceptrons Michael J. Watts
Chapter 9 Neural Network.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Neural Networks1 Introduction to NETLAB NETLAB is a Matlab toolbox for experimenting with neural networks Available from:
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Chapter 4: Introduction to Predictive Modeling: Regressions
1 Chapter 2: Logistic Regression and Correspondence Analysis 2.1 Fitting Ordinal Logistic Regression Models 2.2 Fitting Nominal Logistic Regression Models.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
1 Chapter 2: Accessing and Assaying Prepared Data 2.1 Introduction 2.2 Creating a SAS Enterprise Miner Project, Library, and Diagram 2.3 Defining a Data.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Artificial Neural Networks
Introduction to Data Mining and Classification
Advanced Analytics Using Enterprise Miner
Dr. Morgan C. Wang Department of Statistics
Lecture Notes for Chapter 4 Artificial Neural Networks
Presentation transcript:

Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 5.2 Input Selection 5.3 Stopped Training 5.4 Other Modeling Tools (Self-Study)

Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 5.2 Input Selection 5.3 Stopped Training 5.4 Other Modeling Tools (Self-Study)

Model Essentials – Neural Networks Prediction formula Predict new cases. None Select useful inputs. Stopped training Optimize complexity. ...

Model Essentials – Neural Networks Prediction formula Predict new cases. None None Select useful inputs. Select useful inputs Stopped training Stopped training Optimize complexity Optimize complexity. ...

Model Essentials – Neural Networks Prediction formula Predict new cases. None Select useful inputs. Stopped training Optimize complexity. ...

Neural Network Prediction Formula hidden unit prediction estimate bias estimate weight estimate 1 5 -5 -1 tanh activation function ... ...

Neural Network Prediction Formula hidden unit prediction estimate bias estimate weight estimate 1 5 -5 -1 tanh activation function ...

Neural Network Binary Prediction Formula 1 5 -5 logit link function 1 5 -5 -1 tanh ...

Neural Network Diagram H1 H2 H3 hidden layer x2 input layer x1 y target layer ...

Neural Network Diagram H1 H2 H3 hidden layer x2 input layer x1 y target layer ...

Prediction Illustration – Neural Networks logit equation 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 ...

Prediction Illustration – Neural Networks logit equation 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 Need weight estimates. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 ...

Prediction Illustration – Neural Networks logit equation 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 Weight estimates found by maximizing: 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 Log-likelihood Function ...

Prediction Illustration – Neural Networks logit equation 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.50 0.70 0.60 0.30 0.40 0.60 0.50 0.50 0.40 0.60 Probability estimates are obtained by solving the logit equation for p for each (x1, x2). ^ 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 ...

Neural Nets: Beyond the Prediction Formula Manage missing values. Handle extreme or unusual values. Handle extreme or unusual values Use non-numeric inputs Use non-numeric inputs. Account for nonlinearities. Account for nonlinearities Interpret the model. Interpret the model ...

Neural Nets: Beyond the Prediction Formula Manage missing values. Handle extreme or unusual values. Use non-numeric inputs. Account for nonlinearities. Interpret the model.

Training a Neural Network This demonstration illustrates using the Neural Network tool.

Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 5.2 Input Selection 5.3 Stopped Training 5.4 Other Modeling Tools (Self-Study)

Model Essentials – Neural Networks Prediction formula Predict new cases. Sequential selection None Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

5.01 Multiple Answer Poll Which of the following are true about neural networks in SAS Enterprise Miner? Neural networks are universal approximators. Neural networks have no internal, automated process for selecting useful inputs. Neural networks are easy to interpret and thus are very useful in highly regulated industries. Neural networks cannot model nonlinear relationships. Type answer here

5.01 Multiple Answer Poll – Correct Answers Which of the following are true about neural networks in SAS Enterprise Miner? Neural networks are universal approximators. Neural networks have no internal, automated process for selecting useful inputs. Neural networks are easy to interpret and thus are very useful in highly regulated industries. Neural networks cannot model nonlinear relationships. Type answer here

Selecting Neural Network Inputs This demonstration illustrates how to use a logistic regression to select inputs for a neural network.

Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 5.2 Input Selection 5.3 Stopped Training 5.4 Other Modeling Tools (Self-Study)

Model Essentials – Neural Networks Prediction formula Predict new cases. Sequential selection Select useful inputs. Stopped training Optimize complexity. ...

Fit Statistic versus Optimization Iteration initial hidden unit weights logit( p ) = ^ + 0·H1 + 0·H2 + 0·H3 logit(0.5) logit(ρ1) ^ H1 = tanh(-1.5 - .03x1 - .07x2) H2 = tanh( .79 - .17x1 - .16x2) H3 = tanh( .57 + .05x1 +.35x2 ) ...

Fit Statistic versus Optimization Iteration logit( p ) = ^ + 0·H1 + 0·H2 + 0·H3 H1 = tanh(-1.5 - .03x1 - .07x2) H1 = tanh(-1.5 - .03x1 - .07x2) H2 = tanh( .79 - .17x1 - .16x2) H2 = tanh( .79 - .17x1 - .16x2) H3 = tanh( .57 + .05x1 +.35x2 ) H3 = tanh( .57 + .05x1 +.35x2 ) random initial input weights and biases ...

Fit Statistic versus Optimization Iteration logit( p ) = ^ + 0·H1 + 0·H2 + 0·H3 H1 = tanh(-1.5 - .03x1 - .07x2) H1 = tanh(-1.5 - .03x1 - .07x2) H2 = tanh( .79 - .17x1 - .16x2) H2 = tanh( .79 - .17x1 - .16x2) H3 = tanh( .57 + .05x1 +.35x2 ) H3 = tanh( .57 + .05x1 +.35x2 ) random initial input weights and biases ...

Fit Statistic versus Optimization Iteration 5 15 20 Iteration 10 ...

Fit Statistic versus Optimization Iteration ASE training validation 1 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 2 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 3 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 4 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 6 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 7 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 8 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 9 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 11 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 12 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 13 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 14 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 16 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 17 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 18 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 19 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 21 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 22 Iteration ...

Fit Statistic versus Optimization Iteration ASE training validation 5 10 15 20 23 Iteration ...

Fit Statistic versus Optimization Iteration 0.50 0.70 0.60 0.30 ASE 0.40 0.60 0.50 0.50 0.40 0.60 5 10 12 15 20 Iteration ...

Increasing Network Flexibility This demonstration illustrates how to further improve neural network performance.

Using the AutoNeural Tool (Self-Study) This demonstration illustrates how to use the AutoNeural tool.

Chapter 5: Introduction to Predictive Modeling: Neural Networks and Other Modeling Tools 5.2 Input Selection 5.3 Stopped Training 5.4 Other Modeling Tools (Self-Study)

Model Essentials – Rule Induction Prediction rules / prediction formula Predict new cases. Split search / none Select useful inputs. Ripping / stopped training Optimize complexity.

Rule Induction Predictions [Rips create prediction rules.] A binary model sequentially classifies and removes correctly classified cases. [A neural network predicts remaining cases.] 1.0 0.74 0.9 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.39 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Model Essentials – Dmine Regression Prediction formula Predict new cases. Forward selection Select useful inputs. Stop R-square Optimize complexity.

Dmine Regression Predictions Interval inputs binned, categorical inputs grouped Forward selection picks from binned and original inputs

Model Essentials – DMNeural Stagewise prediction formula Predict new cases. Principal component Select useful inputs. Max stage Optimize complexity.

DMNeural Predictions Up to three PCs with highest target R square are selected. One of eight continuous transformations are selected and applied to selected PCs. The process is repeated three times with residuals from each stage. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Model Essentials – Least Angle Regression Prediction formula Predict new cases. Generalized sequential selection Select useful inputs. Penalized best fit Optimize complexity.

Least Angle Regression Predictions Inputs are selected using a generalization of forward selection. An input combination in the sequence with optimal, penalized validation assessment is selected by default. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Model Essentials – MBR Predict new cases. Select useful inputs. Training data nearest neighbors Predict new cases. None Select useful inputs. Number of neighbors Optimize complexity.

MBR Prediction Estimates Sixteen nearest training data cases predict the target for each point in the input space. Scoring requires training data and the PMBR procedure. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Model Essentials – Partial Least Squares Prediction formula Predict new cases. VIP Select useful inputs. Sequential factor extraction Optimize complexity.

Partial Least Squares Predictions Input combinations (factors) that optimally account for both predictor and response variation are successively selected. Factor count with a minimum validation PRESS statistic is selected. Inputs with small VIP are rejected for subsequent diagram nodes. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Exercises This exercise reinforces the concepts discussed previously.

Neural Network Tool Review Create a multi-layer perceptron on selected inputs. Control complexity with stopped training and hidden unit count.