EE749 Neural Networks How to design a good performance NN? Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

Slides:



Advertisements
Similar presentations
EE 690 Design of Embodied Intelligence
Advertisements

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Training and Testing Neural Networks 서울대학교 산업공학과 생산정보시스템연구실 이상진.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
ITEC 451 Network Design and Analysis. 2 You will Learn: (1) Specifying performance requirements Evaluating design alternatives Comparing two or more systems.
Machine Learning Neural Networks
x – independent variable (input)
The back-propagation training algorithm
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chapter 13 Forecasting.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.
Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.
Application of Back-Propagation neural network in data forecasting Le Hai Khoi, Tran Duc Minh Institute Of Information Technology – VAST Ha Noi – Viet.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
December 5, 2012Introduction to Artificial Intelligence Lecture 20: Neural Network Application Design III 1 Example I: Predicting the Weather Since the.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
NEURAL NETWORKS FOR DATA MINING
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan.
EE459 Neural Networks Backpropagation
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
1 Creating Situational Awareness with Data Trending and Monitoring Zhenping Li, J.P. Douglas, and Ken. Mitchell Arctic Slope Technical Services.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Machine Learning Supervised Learning Classification and Regression
Artificial Neural Networks
Neural Networks Winter-Spring 2014
CS623: Introduction to Computing with Neural Nets (lecture-5)
Real Neurons Cell structures Cell body Dendrites Axon
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.
Generalization ..
ECE 471/571 - Lecture 17 Back Propagation.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Artificial Neural Networks
Neural Networks Geoff Hulten.
Lecture 1: Descriptive Statistics and Exploratory
CS623: Introduction to Computing with Neural Nets (lecture-5)
DeltaV Neural - Expert In Expert mode, the user can select the training parameters, recommend you use the defaults for most applications.
Facultad de Ingeniería, Centro de Cálculo
Presentation transcript:

EE749 Neural Networks How to design a good performance NN? Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University

Glossary Pattern: a complete set of data inputs and outputs that provide a snapshot of the system being modelled –also called examples, cases Feature: an identifying characteristic in the data that the model will ideally capture Domain: a set of boundaries that define the range of expected / observed data for a particular model or problem

Backpropagation Modelling Heuristics Selection of model inputs and outputs Defining the model domain Data pre-processing Selection of training/testing cases Data scaling Number of hidden layers Number of neurons Activation function selection Initial weight values Learning rate and momentum Presentation of patterns Stopping criteria for training Improving model performance

Selection of Model Inputs and Outputs Start with a set of inputs that are KNOWN to affect the process –then add other inputs that are suspected of having a relationship in the process one at a time Eliminate input variables that are redundant (high covariance) Eliminate data patterns that do not contribute to training (no new information) Identify / eliminate data patterns with the same inputs that have different outputs

Defining the Model Domain ANNs should be confined to a limited domain –develop separate models for contradictory areas of the domain Build models that predict a single output –link multiple models together, if required

Data Pre-processing Any time the dynamic range of an input is over a few orders of magnitude, a logarithmic transformation should be applied Transformations may also be useful in “reconditioning” input data when an ANN has trouble converging Non-numerical inputs need to be codified

Selection of Training/Testing Cases Training cases should be representative of the problem domain For good generalization capacity, the training set must be complete –important variable must be measured Data set is randomly divided into training and testing sets –in a 70:30 ratio, for example

Rules of thumb –The number of training patterns should be at least 5 times the number of nodes in the network –The number of training cases should be roughly = the number of weights*the inverse of the accuracy parameter (e, e=0.9 means 90% accuracy of prediction is required)

Data Scaling Output variables should be scaled in the 0.1 to 0.9 range –avoids operating in the saturation range of the sigmoid function

Number of Hidden Layers Increasing the number of hidden layers increases both the time and number of patterns (examples) required for training For most problems, one hidden layer will suffice –problems can always be solved with two Multiple slabs (Ward Nets) supposedly increase processing power –each slab (group of neurons) acts as a detector for one or more input features

Number of Neurons One neuron in the input layer for each input One neuron in the output layer for each output Proper number of hidden neurons is often determined experimentally –too few = poor ability to capture features –too many = poor ability to generalize (ANN simply memorizes training data) Various rules of thumb have been reported –0.75*N –2*N +1 N = number of inputs

In general, more weights introduce more local minima to the error surface Flat regions in the error surface can mislead gradient-based error minimization methods (backpropagation) Start with a small network and then add connections as needed –avoids convergence problems as the networks get too large The optimum ratio of hidden neurons in the first to second hidden layer is 3:1

Activation Function Selection Sigmoidal (logistic) activation functions are the most widely used Thresholding functions are only useful for categorical outputs

Initial Weight Values Weights need to be randomized initially –if all weights are set to the same number, the GDR would never be able to leave the starting point The backpropagation algorithm may also have difficulty if the connection weights are prematurely saturated (>0.9)

Learning Rate and Momentum Learning rate (  ) affects the speed of convergence. –If it is large (>0.5), the weights will be changed more drastically, but this may cause the optimum to be overshot –If it is small (<0.2) the weights will be changed in smaller increments, thus causing the system to converge more slowly with little oscillation The best training occurs when the learning rate is as large as possible without leading to oscillation

The learning rate can be increased as learning progresses, or a momentum term added to improve network learning speed The momentum factor (  ) has the ability to dampen out oscillations caused by increasing the learning rate –a momentum of 0.9 allows higher learning rates

Presenting Patterns to the ANN Present patterns in a random fashion If input patterns can be easily classified, do not train the ANN on all patterns in a class in succession –the ANN will forget information as it moves from class to class Shaping can be used to improve network training –involves starting off on a very small cohesive data set and then adding more patterns that have greater deviations from variable means as training progresses

Stopping Criteria for Training Training should be stopped when one of the following conditions is met –the testing error is sufficiently small –the testing error begins to increase –a set number of iterations has passed

Improving Model Performance Re-initialize network weights to a new set of random values –re-train the model Adjust learning rate and momentum Modify stopping criteria Prune network weights Use genetic algorithms to adjust network topology Add noise to training cases to decrease the chance of memorization

ANN Modelling Approach

Needs and Suitability Assessment What are the modelling needs ? Is the ANN technique suitable to meet these needs ? Can the following be met ? –data requirements –software requirements –hardware requirements –personnel requirements

Data Collection and Analysis Successful models require careful attention to data details Recall : relevant historical data is a key requirement For data collection, investigate: –data availability parameters, time-frame, frequency, format –QA/QC protocols data reliability –process changes

Data requirements and guidelines: –data for each of the parameters must be available –at least one full cycle of data must be available –appropriate QA/QC protocols must be in place –data collected prior to major process changes should generally not be used Data analysis involves: –data characterization –a complete statistical analysis

Data characterization for each parameter –qualitative assessment of hourly, daily, seasonal trends (graphical examination of data) –time-series analyses may be warranted Statistical analysis for each parameter –measures of central tendency mean, median, mode –measures of variability standard deviation, variance –percentile analyses –identification of outliers, erroneous entries, non-entries

Application of a Model-Building Protocol There is no accepted “best method” of developing ANN models An infinite number of distinct architectures are possible A protocol is to reduce the number of architectures that are evaluated

A sample five-step protocol:

Selection of model inputs and outputs Why it’s important: –ANN models are based on process inputs and outputs How it’s done: –first, select the model output best models only have one output parameter –next, select model inputs from available input parameters –selection is based on data availability, literature, expert knowledge

Selection and organization of data patterns Why it’s important: –ANN models are only as good as the data used –separate independent data sets are required to test and validate the model How it’s done: –examine each data pattern for erroneous entries, outliers, blank entries –delete questionable data patterns –sort and divide data into training, testing, and production sets –perform a statistical analysis on each of the three data sets

Determination of architecture characteristics Why it’s important: –each modeling scenario will have an optimal architecture How it’s done: –initially, hold many factors at software defaults or at pre-determined values use modelling heuristics from literature along with expert knowledge –determine the number of hidden layer neurons compare results for different runs

Evaluation of model stability Why it’s important: –ensure that model results are independent of the method of data sorting How it’s done: –build new training, testing, and production sets from the original database –re-train models on the “new” data sets –compare results with initial runs

Model fine tuning Why it’s important: –some models require minor improvements in order to meet process operating criteria How it’s done: –modeling parameters previously held constant can be varied to improve model results –the model fine-tuning methodology is typically researcher-specific

In many situations, more than one good model can be developed The best model is the one that both –meets the modelling needs initially identified –offers the smallest prediction errors Therefore need to be able to evaluate model performance Evaluating Model Performance

Prediction errors can be assessed –graphically visual representation of missed predictions

–using statistics absolute measures of error –mean absolute error (MAE) –maximum absolute error relative measures of error –mean absolute percent error (MAPE) coefficients of correlation –coefficient of correlation (r) –coefficient of multiple correlation (R) coefficients of determination –coefficient of determination (r 2 ) –coefficient of multiple determination (R 2 )

Model residuals should also be studied –residual = prediction error –residuals should be Normally distributed –plot a histogram of the residuals have a mean = 0 independent –plot the residuals (in time order if applicable) –plot should be free of obvious trends have constant variance –plot the residuals against the predicted values –plot should not show spreading, converging, or other trends

Model Evaluation Using Real-time Data Need to consider: –changes in the frequency of data collection –changes in the methodology of data collection –existence of QA/QC protocols to detect erroneous data The evaluation can take the form of –simulated real-time testing on a stand-alone PC –online testing in real-time

Methodology –select the time frame of the test –the data is ported to the developed models –process each data pattern and record the results –the model predicted values are compared to the actual values –prediction errors are determined