Neural networks – Hands on Delta rule and Backpropagation algorithm MetaNeural format for predictive data mining Iris Data Magnetocardiogram data
Neural net yields weights to map inputs to outputs Neural Network Molecular weight w11 h w11 Boiling Point H-bonding Biological response Hydrofobicity h Electrostatic interactions w23 w34 Observable Projection Molecular Descriptor There are many algorithms that can determine the weights for ANNs RENSSELAER
McCulloch-Pitts neuron y x 1 3 N S f() w 2 RENSSELAER
Neural network as collection of M-P neurons x 1 2 w 11 12 13 23 22 32 21 3 y First hidden layer Second hidden Output neuron RENSSELAER
Standard Data Mining Terminology Basic Terminology - MetaNeural Format - Descriptors, features, response (or activity) and ID - Classification versus regression - Modeling/Feature detection - Training/Validation/Calibration - Vertical and horizontal view of data Outliers, rare events and minority classes Data Preparation - Data cleansing - Scaling Leave-one-out and leave-several-out validation Confusion matrix and ROC curves
Standard Data Mining Terminology Basic Terminology - MetaNeural Format - Descriptors, features, response (or activity) and ID - Classification versus regression - Modeling/Feature detection - Training/Validation/Calibration - Vertical and horizontal view of data Outliers, rare events and minority classes Data Preparation - Data cleansing - Scaling Leave-one-out and leave-several-out validation Confusion matrix and ROC curves
TERMINOLOGY Demo: iris_view.bat Standard Data Mining Problem Header and Data MetaNeural Format - descriptors and/or features - response (or activity to predict) - pattern ID - data matrix Validation/Calibration Training/Validation/Test Set Demo: iris_view.bat
UC URVINE DATA REPOSITORY Datafile Name: Fisher's Iris Datafile Subjects: Agriculture , Famous datasets Description: This is a dataset made famous by Fisher, who used it to illustrate principles of discriminant analysis. It contains 6 variables with 150 observations. Reference: Fisher, R. A. (1936). The Use of Multiple Measurements in Axonomic Problems. Annals of Eugenics 7, 179-188. Story Names: Fisher's Irises Authorization: free use Number of cases: 150 Variable Names: 1.Species_No: Flower species as a code 2.Species_Name: Species name 3.Petal_Width: Petal Width 4.Petal_Length: Petal Length 5.Sepal_Width: Sepal Width 6.Sepal_Length: Sepal Length
S • ANALYZE code has neural networks modules built-in • Either run: analyze root.pat 4331 (single training and testing) analyze root.pat 4332 (LOO) analyze root.txt 4333 (bootstrap mode) • Results for analyze are in resultss.xxx and resultss.ttt • Note that patterns have to be properly scaled first • The file name meta overrides the default input file for analyze S
Neural Network Module in Analyze Code ROOT ROOT.PAT ROOT.TES (ROOT.WGT) (ROOT.FWT) (ROOT.DBD) Use Analyze root 4331 for easy way (the file meta let you override defaults) Analyze resultss.XXX resultss.TTT ROOT.TRN (ROOT.DBD) ROOT.WGT ROOT.FWT
MetaNeural Input File for the ROOT Generating and Scaling Data 4 => 4 layers 2 => 2 inputs 16 => # hidden neurons in layer #1 4 => # hidden neurons in layer# 2 1 => # outputs 300 => epoch length (hint:always use 1, for the entire batch) 0.01 => learning parameters by weight layer (hint: 1/# patterns or 1/# epochs) 0.01 0.5 => momentum parameters by weight layer (hint use 0.5) 0.5 10000000 => some very large number of training epochs 200 => error display refresh rate 1 =>sigmoid transfer function 1 => Temperature of sigmoid check.pat => name of file with training patterns (test patterns in root.tes) 0 => not used (legacy entry) 100 => not used (legacy entry) 0.02000 => exit training if error < 0.02 0 => initial weights from a flat random distribution 0.2 => initial random weights all fall between –2 and +2
Generating and Scaling Iris Data
Run Neural Net for Iris Data