Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys.

Similar presentations


Presentation on theme: "Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys."— Presentation transcript:

1 Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys. A: Math. Gen. Presented by :- Anwesha Das Prachi Garg Asmita Sharma

2 ABSTRACT ► Builds a feed forward layered network in order to learn any Boolean function of N Boolean units. ► The number of layers and the number of hidden units are not known in advance. ► Proposes an Algorithm for the network growth, adding layers & units within a layer. ► Convergence is guaranteed.

3 Outline 1. INTRODUCTION / BACKGROUND 2. TILING ALGORITHM DESCRIPTION 3. XOR PROBLEM 4. SIMULATIONS AND RESULTS 5. CONCLUSIONS AND REMARKS 6. FUTURE WORK

4 MOTIVATION ► The drawbacks of back propagation -  Structure of the network has to be guessed.  The error is not guaranteed to converge to an absolute minimum with a zero error.  Use of analog neurons even where digital neurons would be necessary. ► How does one determine the couplings between neurons in the successive layers to build a network achieving a task? ► How do we address to the problem of generalization?  Association of correct outputs to new inputs which are not present in the training set.

5 Introduction ► Creates a strictly feed forward network where connections exist from each layer only to the immediately succeeding layer. ► Units are added like tiles whenever they are needed. First unit is the MASTER UNIT. ► Master unit is trained using the pocket algorithm and checked for the exact output. ► New Ancillary units are added to this layer until we get a “faithful representation” of the problem. –Each layer is constructed in a way that if two samples i, j belong to different classes, then some node in the layer produces different outputs for i and j. –Any two patterns with distinct outputs have distinct internal representation.

6 Introduction contd....... In the absence of faithfulness condition, subsequent layers of nodes would be incapable of distinguishing between samples of different classes. Each subsequent layer will have fewer nodes than the preceding layer, untill there is only one node in the outermost layer for the two-class classification problem. Each subsequent layer will have fewer nodes than the preceding layer, untill there is only one node in the outermost layer for the two-class classification problem. Advantage of this strategy - Advantage of this strategy - –Network structure is dynamically generated through learning, not fixed in advance. –Network grows till a size which enables it to implement desired mapping facilitating generalisation. –Algorithm always produces a finite number of layers which ensures termination.

7 An Example

8 Related Work ► Grossman et all in 1988 implemented another strategy which uses digital neurons within a structure fixed beforehand with one intermediate layer, fixing hidden unit values through trial and error method. ► Rujan and Marchand in 1988 formulated an algorithm which added neurons at will (similar to Tile algorithm) but has only one intermediate layer of hidden units. Couplings between i/p and hidden layers found by exhaustive search procedure inside a restricted set of possible couplings.

9 Tile Algorithm - Basic Notions 1. ed nets, made of binary units which can be in a plus or minus state. 1. Layered nets, made of binary units which can be in a plus or minus state. 2. A unit i in the L th layer is connected to the N L-1 +1 units of the preceding layer and its state S i (L) is obtained by Threshold Rule :-

10 Basic Notions….. where (w i, j L ), j=1, 2.... N L-1, are the couplings. Zero th unit in each layer acts as threshold clamped in +1 state(S 0 (L) =1), so (w i,0 L )=bias. For a given set of p 0 (distinct) patterns of N 0 binary units, we want to learn a given mapping p 0 ≤ 2 N 0

11 Theorem of Convergence We say that two patterns belong to the same class (for the layer L) if they have the same internal representation, which we call the prototype of the class. The problem becomes to map these prototypes onto the desired output.

12 A 3 layer feed forward network through the tiling algorithm

13 Theorem for convergence Theorem:- Suppose that all the classes in layer L- 1 are faithful, and that the number of errors of the master unit, e L-1, is non-zero. Then there exists at least one set of weights w connecting the L - 1 layer to the master unit such that e L ≤ e L-1 - 1. Further more, one can construct explicitly one such set of weights u.

14 Proof of convergence 1. Let τ ν =(τ ν j, j=0..NL-1)be the prototypes in layer L-1 and s ν be the desired output(1 or –1). 2.If the master unit of the L th layer is connected to the L-1 th layer with the weight w(w 1 = 1, w j = 0 for j ≠1), then e L = e L-1. Proof of convergence

15 Let µ 0 be one of the patterns for which τ 1 μ 0 = -s µ 0, and let the set of weights u be u 1 = 1 and u j = λs μ 0 τ µ 0 for j≠1 then Where m µ = value of the master unit of prototype µ obtained from u. Prototype is stabilized i.e. m μ 0 = s μ 0 if λ > 1/(N L-1 )‏ Proof of convergence

16 Consider other pattern where τ 1 μ = s µ, the quantity can take values -N L-1, -N L-1 +2,…. N L-1. Because the representations in the L-1 layer are faithful, -N L-1 can never be obtained. Thus one can choose λ = 1 / (N L-1 -1)‏ So the patterns for which τ 1 µ = s µ still remain.(m μ = s µ )‏ Hence u is one particular solution which, if used to define the master unit of layer L, will give e L ≤ e L-1 – 1.

17 Generating the master unit Using ‘pocket algorithm’ If the particular set u of the previous section is taken as initial set in the pocket algorithm, the output set w will always satisfy e L ≤ e L-1 – 1. For each set of couplings w visited by the perceptron, we compute the number e( w ) of prototypes for which this set would not give the desired output, each prototype v being weighted by its volume Vv :

18 Where δ = Kronecker symbol. Eventually we get the set of couplings which minimises e ( w ) among the w which have been visited by the perceptron. This ‘optimal’ set w* gives a certain number of errors e L = e ( w * ). The point of the pocket algorithm is just to speed up this convergence (i.e. generate less layers). Generating the master unit

19 Building the ancillary units-Divide and Conquer The master unit is not equal to the desired output unit means that at least one of the two classes is unfaithful. We pick one unfaithful class and add a new unit to learn the mapping for the patterns μ belonging to this class only. The class with the smallest size is chosen. Above process is repeated until all classes are faithful.

20 (0, 0) (1, 1) Solving the XOR Problem The two main classes that we have are (0, 1) (1, 0)

21 How do we classify? (0,1)+ve -ve (1,1) (1,0)+ve -ve (0,0)

22 Classification into two classes (0,0) (1,0) (0,1) (1,1) ERRONEOUS CLASS - + Equivalent to an OR problem

23 Computation of the first neuron X₂X₂ X₁X₁ Θ=0.5 w ₁ =1 W ₂ =1 h₁h₁ Here the hidden layer neuron h ₁ computes the OR Function (x ₁ +x ₂ ) Partial Network

24 FAITHFUL REPRESENTATION (1,0) (0,1) (1,1) (1,0) (0,1) (1,1) (0,0) + ve - ve + ve BREAKING THE ERRONEOUS CLASS

25 Computation of the Ancillary Unit x1x1 X₂X₂ h1h1 h2h2 0.5 1.0 -1.5 Computes x 1 x 2 Computes x 1 +x 2 We solve h ₂ after faithful classification which computes x 1 x 2

26 FINAL OUTPUT LAYER x2x2 x1x1 h 1 (x 1 +x 2 ) h₂x1x2h₂x1x2 y h ₁.h ₂ 00010 01111 10111 11100 The outputs of the hidden layer is ANDed to find the final output in the next layer. y=h ₁. h ₂

27 The FINAL NETWORK x1x1 x2x2 h1h1 h2h2 0.5 1.0 -1.5 Computes x 1 x 2 Computes x 1 +x 2 1.0 0.5 y Computes h ₁. h ₂ After AND ing and generating the output layer with a single MASTER UNIT giving the desired output.

28 SIMULATIONS  EXHAUSTIVE LEARNING(USING THE FULL SET OF 2^N PATTERNS)‏  PARITY TASK  RANDOM BOOLEAN FUNCTIONS  GENERALISATION  QUALITY OF CONVERGENCE

29 PARITY TASK  In the parity task for N 0 Boolean units, the output should be 1 if the number of units in state +1 is even, and –1 otherwise.

30 Table 1. The network generated by the algorithm when learning the parity task with N 0 = 6. There is only one hidden layer of 6 units.

31 OUTPUT UNIT Threshold Couplings from the hidden layer to the output unit +11 +13 +11

32 RANDOM BOOLEAN FUNCTION A random Boolean function is obtained by drawing at random the output (±1 with equal probability) for each input configuration. The numbers of layers and of hidden units increase rapidly with N 0.

33 GENERALISATION Once a network has been built by the presentation of a training set, it performs the correct mapping for all the patterns in this training set. The question is: how does it perform on new patterns? The number of training patterns is usually smaller than 2 N 0. The N 0 input neurons are organized in a one-dimensional chain, and the problem is to find out whether the number of domain walls is greater or smaller than three. domain wall –The presence of two neighbouring neurons pointing in opposite directions When the average of domain walls are three in training patterns, the problem is harder than other numbers.

34 Quality of convergence To quantify the quality of convergence at least two parameters can be thought of :- 1.The number of errors e L which would be produced by the network if stopped at layer L. 2. The number of distinct internal representations (classes) p L in each layer L. Its noticed that in the range 2 ≤L≤ 7 the decrease in e L islinear in L, and this seems to be the case also with p L. 3. Its tempting to use the slope of the linear decrease of the percentage of errors as a measure of the complexity of the problem to be learnt.

35 Comments 1. It is useful to limit as much as possible the number of hidden units. 2.There is a lot of freedom in the choice of the unfaithful classes to be learnt. 3.How to choose the maximum number of iterations which are allowed before one decides that the perceptron algorithm has not converged? - Adjustment parameter. 4.Optimization of the algorithm.

36 Conclusions 1. Presented a new strategy for building a feed-forward layered network for any given Boolean function. 2. Identified some possible roles of the hidden units: the master units and the ancillary units 3. The geometry, including the number of units involved and the connections, is not fixed in advance, but generated by a growth process.

37 FUTURE WORK 1. Improvement of the algorithm by investigating variants of the perceptron algorithm used to build the hidden units. 2. Systematic comparison of the performances(efficiency, computer time, size of the architecture)of the tiling algorithm with those of other Algorithms. 2. Systematic comparison of the performances(efficiency, computer time, size of the architecture)of the tiling algorithm with those of other Algorithms. 3. Generalization to neurons with continuous values whether the algorithm works for continuous inputs and binary outputs. What happens when the data is conflicting?-identical patterns have different outputs. 3. Generalization to neurons with continuous values whether the algorithm works for continuous inputs and binary outputs. What happens when the data is conflicting?-identical patterns have different outputs. 4. Finding a strategy which limits as much as possible the number of units in each layer. 4. Finding a strategy which limits as much as possible the number of units in each layer. 5. Generalization to several output units. 5. Generalization to several output units.

38 THANK YOU...


Download ppt "Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys."

Similar presentations


Ads by Google