Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

1 The tiling algorithm Learning in feedforward layered networks: the tiling algorithm writed by Marc M é zard and Jean-Pierre Nadal.
NEURAL NETWORKS Backpropagation Algorithm
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The back-propagation training algorithm
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
CSE115/ENGR160 Discrete Mathematics 03/03/11 Ming-Hsuan Yang UC Merced 1.
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Chapter 6: Multilayer Neural Networks
Before we start ADALINE
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
CS 4700: Foundations of Artificial Intelligence
Aula 4 Radial Basis Function Networks
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Artificial Neural Networks
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Multi-Layer Perceptrons Michael J. Watts
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Numerical Methods.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 2 Single Layer Feedforward Networks
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Deep Feedforward Networks
Chapter 2 Single Layer Feedforward Networks
Neuro-Computing Lecture 4 Radial Basis Function Network
of the Artificial Neural Networks.
network of simple neuron-like computing elements
CS623: Introduction to Computing with Neural Nets (lecture-9)
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Presentation transcript:

Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys. A: Math. Gen. Presented by :- Anwesha Das Prachi Garg Asmita Sharma

ABSTRACT ► Builds a feed forward layered network in order to learn any Boolean function of N Boolean units. ► The number of layers and the number of hidden units are not known in advance. ► Proposes an Algorithm for the network growth, adding layers & units within a layer. ► Convergence is guaranteed.

Outline 1. INTRODUCTION / BACKGROUND 2. TILING ALGORITHM DESCRIPTION 3. XOR PROBLEM 4. SIMULATIONS AND RESULTS 5. CONCLUSIONS AND REMARKS 6. FUTURE WORK

MOTIVATION ► The drawbacks of back propagation -  Structure of the network has to be guessed.  The error is not guaranteed to converge to an absolute minimum with a zero error.  Use of analog neurons even where digital neurons would be necessary. ► How does one determine the couplings between neurons in the successive layers to build a network achieving a task? ► How do we address to the problem of generalization?  Association of correct outputs to new inputs which are not present in the training set.

Introduction ► Creates a strictly feed forward network where connections exist from each layer only to the immediately succeeding layer. ► Units are added like tiles whenever they are needed. First unit is the MASTER UNIT. ► Master unit is trained using the pocket algorithm and checked for the exact output. ► New Ancillary units are added to this layer until we get a “faithful representation” of the problem. –Each layer is constructed in a way that if two samples i, j belong to different classes, then some node in the layer produces different outputs for i and j. –Any two patterns with distinct outputs have distinct internal representation.

Introduction contd In the absence of faithfulness condition, subsequent layers of nodes would be incapable of distinguishing between samples of different classes. Each subsequent layer will have fewer nodes than the preceding layer, untill there is only one node in the outermost layer for the two-class classification problem. Each subsequent layer will have fewer nodes than the preceding layer, untill there is only one node in the outermost layer for the two-class classification problem. Advantage of this strategy - Advantage of this strategy - –Network structure is dynamically generated through learning, not fixed in advance. –Network grows till a size which enables it to implement desired mapping facilitating generalisation. –Algorithm always produces a finite number of layers which ensures termination.

An Example

Related Work ► Grossman et all in 1988 implemented another strategy which uses digital neurons within a structure fixed beforehand with one intermediate layer, fixing hidden unit values through trial and error method. ► Rujan and Marchand in 1988 formulated an algorithm which added neurons at will (similar to Tile algorithm) but has only one intermediate layer of hidden units. Couplings between i/p and hidden layers found by exhaustive search procedure inside a restricted set of possible couplings.

Tile Algorithm - Basic Notions 1. ed nets, made of binary units which can be in a plus or minus state. 1. Layered nets, made of binary units which can be in a plus or minus state. 2. A unit i in the L th layer is connected to the N L-1 +1 units of the preceding layer and its state S i (L) is obtained by Threshold Rule :-

Basic Notions….. where (w i, j L ), j=1, N L-1, are the couplings. Zero th unit in each layer acts as threshold clamped in +1 state(S 0 (L) =1), so (w i,0 L )=bias. For a given set of p 0 (distinct) patterns of N 0 binary units, we want to learn a given mapping p 0 ≤ 2 N 0

Theorem of Convergence We say that two patterns belong to the same class (for the layer L) if they have the same internal representation, which we call the prototype of the class. The problem becomes to map these prototypes onto the desired output.

A 3 layer feed forward network through the tiling algorithm

Theorem for convergence Theorem:- Suppose that all the classes in layer L- 1 are faithful, and that the number of errors of the master unit, e L-1, is non-zero. Then there exists at least one set of weights w connecting the L - 1 layer to the master unit such that e L ≤ e L Further more, one can construct explicitly one such set of weights u.

Proof of convergence 1. Let τ ν =(τ ν j, j=0..NL-1)be the prototypes in layer L-1 and s ν be the desired output(1 or –1). 2.If the master unit of the L th layer is connected to the L-1 th layer with the weight w(w 1 = 1, w j = 0 for j ≠1), then e L = e L-1. Proof of convergence

Let µ 0 be one of the patterns for which τ 1 μ 0 = -s µ 0, and let the set of weights u be u 1 = 1 and u j = λs μ 0 τ µ 0 for j≠1 then Where m µ = value of the master unit of prototype µ obtained from u. Prototype is stabilized i.e. m μ 0 = s μ 0 if λ > 1/(N L-1 )‏ Proof of convergence

Consider other pattern where τ 1 μ = s µ, the quantity can take values -N L-1, -N L-1 +2,…. N L-1. Because the representations in the L-1 layer are faithful, -N L-1 can never be obtained. Thus one can choose λ = 1 / (N L-1 -1)‏ So the patterns for which τ 1 µ = s µ still remain.(m μ = s µ )‏ Hence u is one particular solution which, if used to define the master unit of layer L, will give e L ≤ e L-1 – 1.

Generating the master unit Using ‘pocket algorithm’ If the particular set u of the previous section is taken as initial set in the pocket algorithm, the output set w will always satisfy e L ≤ e L-1 – 1. For each set of couplings w visited by the perceptron, we compute the number e( w ) of prototypes for which this set would not give the desired output, each prototype v being weighted by its volume Vv :

Where δ = Kronecker symbol. Eventually we get the set of couplings which minimises e ( w ) among the w which have been visited by the perceptron. This ‘optimal’ set w* gives a certain number of errors e L = e ( w * ). The point of the pocket algorithm is just to speed up this convergence (i.e. generate less layers). Generating the master unit

Building the ancillary units-Divide and Conquer The master unit is not equal to the desired output unit means that at least one of the two classes is unfaithful. We pick one unfaithful class and add a new unit to learn the mapping for the patterns μ belonging to this class only. The class with the smallest size is chosen. Above process is repeated until all classes are faithful.

(0, 0) (1, 1) Solving the XOR Problem The two main classes that we have are (0, 1) (1, 0)

How do we classify? (0,1)+ve -ve (1,1) (1,0)+ve -ve (0,0)

Classification into two classes (0,0) (1,0) (0,1) (1,1) ERRONEOUS CLASS - + Equivalent to an OR problem

Computation of the first neuron X₂X₂ X₁X₁ Θ=0.5 w ₁ =1 W ₂ =1 h₁h₁ Here the hidden layer neuron h ₁ computes the OR Function (x ₁ +x ₂ ) Partial Network

FAITHFUL REPRESENTATION (1,0) (0,1) (1,1) (1,0) (0,1) (1,1) (0,0) + ve - ve + ve BREAKING THE ERRONEOUS CLASS

Computation of the Ancillary Unit x1x1 X₂X₂ h1h1 h2h Computes x 1 x 2 Computes x 1 +x 2 We solve h ₂ after faithful classification which computes x 1 x 2

FINAL OUTPUT LAYER x2x2 x1x1 h 1 (x 1 +x 2 ) h₂x1x2h₂x1x2 y h ₁.h ₂ The outputs of the hidden layer is ANDed to find the final output in the next layer. y=h ₁. h ₂

The FINAL NETWORK x1x1 x2x2 h1h1 h2h Computes x 1 x 2 Computes x 1 +x y Computes h ₁. h ₂ After AND ing and generating the output layer with a single MASTER UNIT giving the desired output.

SIMULATIONS  EXHAUSTIVE LEARNING(USING THE FULL SET OF 2^N PATTERNS)‏  PARITY TASK  RANDOM BOOLEAN FUNCTIONS  GENERALISATION  QUALITY OF CONVERGENCE

PARITY TASK  In the parity task for N 0 Boolean units, the output should be 1 if the number of units in state +1 is even, and –1 otherwise.

Table 1. The network generated by the algorithm when learning the parity task with N 0 = 6. There is only one hidden layer of 6 units.

OUTPUT UNIT Threshold Couplings from the hidden layer to the output unit

RANDOM BOOLEAN FUNCTION A random Boolean function is obtained by drawing at random the output (±1 with equal probability) for each input configuration. The numbers of layers and of hidden units increase rapidly with N 0.

GENERALISATION Once a network has been built by the presentation of a training set, it performs the correct mapping for all the patterns in this training set. The question is: how does it perform on new patterns? The number of training patterns is usually smaller than 2 N 0. The N 0 input neurons are organized in a one-dimensional chain, and the problem is to find out whether the number of domain walls is greater or smaller than three. domain wall –The presence of two neighbouring neurons pointing in opposite directions When the average of domain walls are three in training patterns, the problem is harder than other numbers.

Quality of convergence To quantify the quality of convergence at least two parameters can be thought of :- 1.The number of errors e L which would be produced by the network if stopped at layer L. 2. The number of distinct internal representations (classes) p L in each layer L. Its noticed that in the range 2 ≤L≤ 7 the decrease in e L islinear in L, and this seems to be the case also with p L. 3. Its tempting to use the slope of the linear decrease of the percentage of errors as a measure of the complexity of the problem to be learnt.

Comments 1. It is useful to limit as much as possible the number of hidden units. 2.There is a lot of freedom in the choice of the unfaithful classes to be learnt. 3.How to choose the maximum number of iterations which are allowed before one decides that the perceptron algorithm has not converged? - Adjustment parameter. 4.Optimization of the algorithm.

Conclusions 1. Presented a new strategy for building a feed-forward layered network for any given Boolean function. 2. Identified some possible roles of the hidden units: the master units and the ancillary units 3. The geometry, including the number of units involved and the connections, is not fixed in advance, but generated by a growth process.

FUTURE WORK 1. Improvement of the algorithm by investigating variants of the perceptron algorithm used to build the hidden units. 2. Systematic comparison of the performances(efficiency, computer time, size of the architecture)of the tiling algorithm with those of other Algorithms. 2. Systematic comparison of the performances(efficiency, computer time, size of the architecture)of the tiling algorithm with those of other Algorithms. 3. Generalization to neurons with continuous values whether the algorithm works for continuous inputs and binary outputs. What happens when the data is conflicting?-identical patterns have different outputs. 3. Generalization to neurons with continuous values whether the algorithm works for continuous inputs and binary outputs. What happens when the data is conflicting?-identical patterns have different outputs. 4. Finding a strategy which limits as much as possible the number of units in each layer. 4. Finding a strategy which limits as much as possible the number of units in each layer. 5. Generalization to several output units. 5. Generalization to several output units.

THANK YOU...