Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

Introduction to Neural Networks Computing
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
For Wednesday Read chapter 19, sections 1-3 No homework.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
Radial Basis Function Networks
For Better Accuracy Eick: Ensemble Learning
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Applying Neural Networks Michael J. Watts
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Non-Bayes classifiers. Linear discriminants, neural networks.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
EE459 Neural Networks Backpropagation
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Intro. ANN & Fuzzy Systems Lecture 37 Genetic and Random Search Algorithms (2)
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Lecture 39 Hopfield Network
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
Applying Neural Networks
The Gradient Descent Algorithm
Lecture 12. MLP (IV): Programming & Implementation
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning Today: Reading: Maria Florina Balcan
ECE 471/571 - Lecture 17 Back Propagation.
Lecture 12. MLP (IV): Programming & Implementation
Lecture 9 MLP (I): Feed-forward Model
Lecture 11. MLP (III): Back-Propagation
Outline Single neuron case: Nonlinear error correcting learning
Artificial Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Artificial Intelligence 10. Neural Networks
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 2 Outline Dynamic range control Local minimum avoidance Encoding symbolic features Modular learning

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 3 Dynamic Range Control Saturation of the nonlinear activation function To limit the magnitude of net-function: –Initialize weights to small magnitude. –Scale dynamic range of inputs and outputs –Use tanh(u) in hidden layer –Do not use too many neurons (limit N) per layer.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 4 Local Minimum Error Surface: E(W) is a nonlinear, continuously differentiable function of W. Gradient search will converge to a local minimum. However, there are often many local minima on the error surface. Dipping into an inferior local minima too early in the learning process is often undesirable! Local minimum Global minimum

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 5 Tips to Avoid Local Minima Randomize weights initially. Reinitialize weights whenever deemed necessary. Randomize the order of presentation of data samples –Notion of "re-sampling" or boot-strapping. Invoke random search step when gradient search (BP) fail to reduce error function. One way is to simply use a larger step size . Another way is to perturb the weights with small noise. Add small noise explicitly to samples, or to the net function.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 6 Learning Rate and Momentum For gradient based search,  should be kept small (< 0.1 or even smaller). Need many more search steps. Initially, large  can be used.  is too large if the RMS error fluctuates violently. A "schedule" can be made to use different values of  at different stages of learning. E.g.  = A/(1+n/B). One purpose of the momentum term with µ is to stabilize the search direction upon convergence. Hence a larger µ (say, 0.9) can be used.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 7 Epoch Size The number of training samples to be used per epoch certainly affect the speed of training. K too large: Each epoch takes too long to compute. K too small: weights are updated very frequently and error among epochs may fluctuate violently.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 8 Data Partitioning Tuning set: –a portion of training set data should be reserved as a tuning set. During training, the weights are updated using training set and the generalization error is estimated by evaluating over tuning set. Cross-validation: –Partition data into M portions. Use M-1 portions as the training set and the remaining portion as tuning set. Then rotate so that each portion will be the tuning set once. The averaged tuning set error will be a good estimate of the generalization error.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 9 Stopping Criteria If there is tuning set, then stop when the tuning set error reaches its minimum. This requires remembering the best set of weights. Too much training may cause problem of over-fitting. Testing set should not be used during training even for checking stop conditions.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 10 Non-numerical Features Encoding Input coding (Converting symbolic features to numerical features) One-in-N coding {R, G, Y} -> {100, 010, 001} Thermometer coding: 1 -> , 2 -> , 3 -> , 4 -> , 5 -> Output coding One-class-one-output (one in N coding): e.g. for 3 classes, 3 outputs, one has 1 0 0, 0 1 0, Error-correcting code (Max. Hamming distance): e.g. for 4 classes, 3 outputs: 1 0 0, 0 1 0, 0 0 1, 1 1 1

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 11 Modular Learning Partition the entire network into modular piece and train them separately. Output Partitioning – E.g. one network focuses on one output only. Input Partitioning – e.g. one network learns features a, b, the other learns with feature c. Or, each neuron will respond to a particular subset of data samples. Data Fusion – Glue training and fine tuning. Put all modules together, add more hidden neurons, fine tune the performance.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 12 One Output One Net Assumption: The output is proportional to the likelihood that the feature is consistent with the class. Thus, the larger the output, the more confident the network is about the input feature. Winner Takes All Final decision Input feature x Net #1 Net #2 Net # p

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 13 Mixture of Experts Each classifier is considered as an expert in a special region of the feature space. A gating network determines which expert the current testing sample should be examined, and assign dynamically the weighting of expert’s opinion. + Expert # 1 Expert # 2 Expert #N Gating Network y1y1 y2y2 ynyn g1g1 g2g2 gngn Final result

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 14 Ensemble Network Instead of competing to give the final output, the outputs of every member classifier are combined to form a collective opinion. Linear combining rule: voting, weighted ensemble averaging Nonlinear combining rule: Stack generalization machine Combining rules Classifier #1 Classifier #2 Classifier #N Final result

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 15 Other Learning Methods BP is a steepest gradient descent optimization method. Other methods that can be used to find weights of a MLP include: –Conjugate gradient method –Levenburg-Marquardt method –Quasi-Newton method –Least square Kalman filtering method