Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.

Slides:

Advertisements

Similar presentations

Feed-forward Networks

Advertisements

Multi-Layer Perceptron (MLP)

NEURAL NETWORKS Backpropagation Algorithm

Introduction to Neural Networks Computing

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.

Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.

Machine Learning Neural Networks

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

1 Abstract This study presents an analysis of two modified fuzzy ARTMAP neural networks. The modifications are first introduced mathematically. Then, the.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Chapter 6: Multilayer Neural Networks

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Artificial Neural Networks

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Neural networks.

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

Artificial Neural Networks

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.

Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.

Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

Chapter 2 Single Layer Feedforward Networks

Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

EEE502 Pattern Recognition

Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.

Chapter 6 Neural Network.

Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Soft Computing Lecture 15 Constructive learning algorithms. Network of Hamming.

Intro. ANN & Fuzzy Systems Lecture 3 Basic Definitions of ANN.

Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Neural networks.

The Gradient Descent Algorithm

Neural Networks Winter-Spring 2014

Chapter 2 Single Layer Feedforward Networks

第 3 章神经网络.

Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.

Lecture 9 MLP (I): Feed-forward Model

Lecture 11. MLP (III): Back-Propagation

Lecture 22 Clustering (3).

Outline Single neuron case: Nonlinear error correcting learning

Artificial Intelligence Chapter 3 Neural Networks

Competitive Networks.

Lecture Notes for Chapter 4 Artificial Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Competitive Networks.

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Computational Intelligence

Computer Vision Lecture 19: Object Recognition III

Lecture 8. Learning (V): Perceptron Learning

Lecture 16. Classification (II): Practical Considerations

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 2 Outline Universal approximation property Network structure selection Structural Design of MLP –Dynamic and Static Growing Methods –Dynamic and Static Pruning Methods Batch Learning Methods

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 3 Configuration of MLP The choice of number of hidden nodes is –problem and data dependent; –even the optimal number is given, training may take long and finding optimal weights may be difficult. Methods for changing the number of hidden nodes dynamically are available, but not widely used. Two approaches: Growing vs. pruning.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 4 Universal Approximation of MLP Let R be a finite region in the feature space and g(x) be an unknown function that is continuous over R. Let f(W,x) be a 2-layer MLP with sigmoidal activation function at the hidden layer neurons and linear activation function at the output neuron. Then there exist a set of weights W such that f(W,x) can approximate g(x) arbitrarily close. This is an existence theorem. A three-layer MLP with two layers of hidden nodes, each with hyper-plane net function and sigmoidal activation function, is capable of learning any classification tasks. The trick is to partition the feature space into unions of convex regions. Each region may be labeled with the same or different class labels. Then a set of hidden nodes can be used to learn each convex region.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 5 What we know so far … For most problems, one or two layers of hidden layers will be sufficient. If there are too many hidden layers, it may take many training epochs for the delta error to propagated backward so that learning takes place. However, the number of hidden nodes per layer really depends on many issues and no good hint is available. If too many hidden nodes are used, there may be too much computation during training. If too few hidden nodes are used, the model complexity may be too low to learn the underlying structure of the data. Often repeated experimentation is the best way to figure out how many hidden nodes are best for the problem on hand.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 6 Dynamic Net Growing Methods (1)Dynamic Node Generation (T. Ash, 1989) – Adding new nodes when output error curve flatten. (2)Restricted Coulomb Energy (Reilly et al., 1982) – Add new hidden units when none of the existing hidden units activated by the input. (3)Grow and Learn (Alpaydin, 1990) – Hidden units compete to perform classification. If not successful, a new hidden unit is added. A sleep procedure is also devised to remove redundancies among hidden units. (4)Cascade Correlation (Fahlman & Lebiere, 1990) – When a mapping can not be learned, a new hidden layer with one hidden unit is added while all previous hidden units’ weights frozen.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 7 Dynamic Net Growing Methods When to add a new hidden unit? –When total error ceases decreasing; –When a specific training pattern can not be learned When to stop adding new hidden units? –When all training patters are correctly classified –When total approximation error < a preset bound –When a combined cost criteria reaches minimum (Adding more will cause it to increase!) What to do after adding each new hidden unit? –Retrain the entire net; –Train the new weights only, freeze the rest

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 8 Dynamic Net Growing Methods Local Representation Approach: Lateral inhibition/ competitive learning used so that only a single hidden unit will be activated for a particular input — Similar to the LVQ (SOFM) method Approximation Approach: Total cost = approx. cost (noisy data, classification error) + generalization cost (No. of Hidden units) COMMON PROBLEM – Too many redundant hidden units may grow ! SOLUTION – Pruning after growing

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 9 Dynamic Net Pruning Methods Weight Decaying: (Rumelhart, 1988; Chauvin, 1990; Hanson & Pratt 1990) – Cost E =  m (t (m) – z (m) ) 2 +  i,j W ij 2 /(1+W ij 2 ) The second term is for regularization. Then  E/  w ij = – 2(t (m) – z (m) )+ For large w ij, the second term ~ 0 –– No effect. For small w ij, the second term ~ 2w ij –– Penalized.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 10 Static Net Pruning Methods (I) Skeletonization (Mozer and Smolensky, 1989) – The least relevant units will be removed first. Relevance(i) =  E/  y(i) Optimal Brain Damage (Le Cun et al. 1990) – The least salient weight is removed. Saliency =  2 E/  [w(i,j)] 2   E/  w(i.j)=0

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 11 Static Net Pruning Methods (II) Hidden Unit Reduction Method (Hu et. al 1990) –The set of hidden units whose outputs are most independent are retained. Frobinius Norm Approximation Method (Kung & Hu 1990) – Hidden units whose output best approximate the activation of the trained network are retained.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 12 Perfect MLP Training A batch computing method to achieve zero MLP training error WITHOUT iterative training. If M samples available, requires M–1 hidden units. Basic Idea: Use hidden neuron to encode each sample by a unique code: x x 10 0 where x = 0 or 1.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 13 Perfect Training Algorithm 1.Sort all input x (k) ; 1  k  K such that ||x (k) ||  ||x (k+1) ||. 2.For k = 1 to K–1 Do Add k -th neuron with weight vector w k, bias q k s. t. w k = x (k),  k = –(1/2)[||x (k) || + max([x (k) ] T x (k') ; k' > k)]. 3.Compute hidden node output matrix Y Kx(K-1) for all inputs. Y matrix has 1 along its main diagonal. 4.Compute weights and bias of output neurons as: [q W] = [1 Y]  1 f  1 (T) T: target vector (all inputs), f  1 (  ): inverse activation function.

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 14 XOR Training Example X = T = and f  1 (T) = [  hidden W hidden ] = Y = [  out W out ] = [–5 – ]

Intro. ANN & Fuzzy Systems (C) 2001 by Yu Hen Hu 15 XOR Result and Pruning Left two neurons have the same output as X, hence can be eliminated w/o affecting the result.