Download presentation
Presentation is loading. Please wait.
1
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions WK5 – Dynamic Networks CS 476: Networks of Neural Computation WK5 – Dynamic Networks: Time Delayed & Recurrent Networks Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009
2
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Contents Sequence Learning Time Delayed Networks I: Implicit Representation Time Delayed Networks II: Explicit Representation Recurrent Networks I: Elman + Jordan Networks Recurrent Networks II: Back Propagation Through Time Conclusions
3
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Sequences Sequence Learning MLP & RBF networks are static networks, i.e. they learn a mapping from a single input signal to a single output response for an arbitrary large number of pairs; Dynamic networks learn a mapping from a single input signal to a sequence of response signals, for an arbitrary number of pairs (signal,sequence). Typically the input signal to a dynamic network is an element of the sequence and then the network produces as a response the rest of the sequence. To learn sequences we need to include some form of memory (short term memory) to the network.
4
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Sequences Sequence Learning II We can introduce memory effects with two principal ways: Implicit: e.g. Time lagged signal as input to a static network or as recurrent connections Explicit: e.g. Temporal Backpropagation Method In the implicit form, we assume that the environment from which we collect examples of (input signal, output sequence) is stationary. For the explicit form the environment could be non- stationary, i.e. the network can track the changes in the structure of the signal.
5
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed I Time Delayed Networks I The time delayed approach includes two basic types of networks: Implicit Representation of Time: We combine a memory structure in the input layer of the network with a static network model Explicit Representation of Time: We explicitly allow the network to code time, by generalising the network weights from scalars to vectors, as in TBP (Temporal Backpropagation). Typical forms of memories that are used are the Tapped Delay Line and the Gamma Memory family.
6
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed I Time Delayed Networks I The Tapped Delay Line form of memory is shown below for an input signal x(n): The Gamma form of memory is defined by:
7
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed I Time Delayed Networks I The Gamma Memory is shown below:
8
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed I Time Delayed Networks I In the implicit representation approach we combine a static network (e.g. MLP / RBF) with a memory structure (e.g. tapped delay line). An example is shown below (the NETtalk network):
9
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed I Time Delayed Networks I We present the data in a sliding window. For example in NETtalk the middle group of input neurons present the letter in focus. The rest of the input groups, three before & three after, present context. The purpose is to predict for example the next symbol in the sequence. The NETtalk model (Sejnowski & Rosenberg, 1987) has: 203 input nodes 80 hidden neurons 26 output neurons 18629 weights Used BP method for training
10
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II In explicit time representation, neurons have a spatio- temporal structure, i.e. its synapse arriving to a neuron is not a scalar number but a vector of weights, which are used for convolution of the time-delayed input signal of a previous neuron with the synapse. A schematic representation of a neuron is given below:
11
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II The output of the neuron in this case is given by: In case of a whole network, for example assuming a single output node and a linear output layer, the response is given by: Where p is the depth of the memory and b 0 is the bias of the output neuron
12
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II In the more general case, where we have multiple neurons at the output layer, we have for neuron j of any layer: The output of any synapse is given by the convolution sum:
13
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II Where the state vector x i (n) and weight vector w ji for synapse I are defined as follows: x i (n)=[x i (n), x i (n-1),…, x i (n-p)] T w ji =[w ji (0), w ji (1),…, w ji (p)] T
14
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law To train such a network we use the Temporal BackPropagation algorithm. We present the algorithm below. Assume that neuron j lies in the output layer and its response is denoted by y j (n) at time n, while its desired response is given by d j (n). We can define an instantaneous value for the sum of squared errors produced by the network as follows:
15
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-1 The error signal at the output layer is defined by: The idea is a minimise an overall cost function, calculated over all time: We could proceed as usual by calculating the gradient of the cost function over the weights. This implies that we need to calculate the instantaneous gradient:
16
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-2 However this approach to work we need to unfold the network in time (i.e. to convert it to an equivalent static network and then calculate the gradient). This option presents a number of disadvantages: A loss of symmetry between forward and backward pass for the calculation of instantaneous gradient; No nice recursive formula for propagation of error terms; Need for global bookkeeping to keep track of which static weights are actually the same in the equivalent network
17
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-3 For these reasons we prefer to calculate the gradient of the cost function as follows: Note that in general holds: The equality is correct only when we take the sum over all time.
18
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-4 To calculate the weight update we use the steepest descent method: Where is the learning rate. We calculate the terms in the above relation as follows: This is by definition the induced field v j (n)
19
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-5 We define the local gradient as: Thus we can write the weight update equations in the familiar form: We need to calculate the for the cases of output and hidden layers.
20
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-6 For the output layer the local gradient is given by: For a hidden layer we assume that neuron j is connected to a set A of neurons in the next layer (hidden or output). Then we have:
21
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-7 By re-writing we get the following: Finally we putting all together we get:
22
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Time Delayed II Time Delayed Networks II: Learning Law-8 Where l is the layer level
23
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent I A network is called recurrent when there are connections which feedback to previous layers or neurons, including self-connections. An example is shown next: Successful early models of recurrent networks are: Jordan Network Elman Network
24
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent I The Jordan Network has the structure of an MLP and additional context units. The Output neurons feedback to the context neurons in 1-1 fashion. The context units also feedback to themselves. The network is trained by using the Backpropagation algorithm A schematic is shown in the next figure:
25
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent I The Elman Network has also the structure of an MLP and additional context units. The Hidden neurons feedback to the context neurons in 1-1 fashion. The hidden neurons’ connections to the context units are constant and equal to 1. It is also called Simple Recurrent Network (SRN). The network is trained by using the Backpropagation algorithm A schematic is shown in the next figure:
26
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II More complex forms of recurrent networks are possible. We can start by extending a MLP as a basic building block. Typical paradigms of complex recurrent models are: Nonlinear Autoregressive with Exogenous Inputs Network (NARX) The State Space Model The Recurrent Multilayer Perceptron (RMLP) Schematic representations of the networks are given in the next slides:
27
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-1 The structure of the NARX model includes: A MLP static network; A current input u(n) and its delayed versions up to a time q; A time delayed version of the current output y(n) which feeds back to the input layer. The memory of the delayed output vector is in general p. The output is calculated as: y(n+1)=F(y(n),…,y(n-p+1),u(n),…,u(n-q+1))
28
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-2 A schematic of the NARX model is as follows:
29
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-3 The structure of the State Space model includes: A MLP network with a single hidden layer; The hidden neurons define the state of the network; A linear output layer; A feedback of the hidden layer to the input layer assuming a memory of q lags; The output is determined by the coupled equations: x(n+1)=f(x(n),u(n)) y(n+1)=C x(n+1)
30
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-4 Where f is a suitable nonlinear function characterising the hidden layer. x is the state vector, as it is produced by the hidden layer. It has q components. y is the output vector and it has p components. The input vector is given by u and it has m components. A schematic representation of the network is given below:
31
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-5 The structure of the RMLP includes: One or more hidden layers; Feedback around each layer; The general structure of a static MLP network; The output is calculated as follows (assuming that x I, x II, and x o are the first, second and output layer outputs): x I (n+1)= I (x I (n), u(n)) x II (n+1)= II (x II (n), x I (n+1)) x O (n+1)= O (x O (n), x II (n+1))
32
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-6 Where the functions I (), II () and O () denote the Activation functions of the corresponding layer. A schematic representation is given below:
33
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-7 Some theorems on the computational power of recurrent networks: Thm 1: All Turing machines may be simulated by fully connected recurrent networks built on neurons with sigmoid activation functions. Thm 2: NARX networks with one layer of hidden neurons with bounded, one-sided saturated activation functions and a linear output neuron can simulate fully connected recurrent networks with bounded, one-sided saturated activation functions, except for a linear slowdown.
34
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-8 Corollary: NARX networks with one hidden layer of neurons with BOSS activations functions and a linear output neuron are Turing equivalent.
35
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-9 The training of the recurrent networks can be done with two methods: BackPropagation Through Time Real-Time Recurrent Learning We can train a recurrent network with either epoch- based or continuous training operation. However an epoch in recurrent networks does not mean the presentation of all learning patterns but rather denotes the length of a single sequence that we use for training. So an epoch in recurrent network corresponds in presenting only one pattern to the network. At the end of an epoch the network stabilises.
36
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-10 Some useful heuristics for the training is given below: Lexigraphic order of training samples should be followed, with the shortest strings of symbols being presented in the network first; The training should begin with a small training sample and then its size should be incrementally increased as the training proceeds; The synaptic weights of the network should be updated only if the absolute error on the training sample currently being processed by the network is greater than some prescribed criterion;
37
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-11 The use of weight decay during training is recommended; weight decay was discussed in WK3 The BackPropagation Through Time algorithm proceeds by unfolding a network in time. To be more specific: Assume that we have a recurrent network N which is required to learn a temporal task starting from time n 0 and going all the way to time n. Let N* denote the feedforward network that results from unfolding the temporal operation of the recurrent network N.
38
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-12 The network N* is related to the original network N as follows: For each time step in the interval (n 0,n], the network N* has a layer containing K neurons, where K is the number of neurons contained in network N; In every layer of network N* there is a copy of each neuron in network N; For every time step l [n 0,n], the synaptic connection from neuron i in layer l to neuron j in layer l+1 of the network N* is a copy of the synaptic connection from neuron i to neuron j in the network N. The following example explains the idea of unfolding:
39
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-13 We assume that we have a network with two neurons which is unfolded for a number of steps, n:
40
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-14 We present now the method of Epochwise BackPropagation Through Time. Let the dataset used for training the network be partitioned into independent epochs, with each epoch representing a temporal pattern of interest. Let n 0 denote the start time of an epoch and n 1 denotes its end time. We can define the following cost function:
41
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-15 Where A is the set of indices j pertaining to those neurons in the network for which desired responses are specified, and e j (n) is the error signal at the output of such a neuron measured with respect to some desired response.
42
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-16 The algorithm proceeds as follows: 1. For a given epoch, the recurrent network starts running from some initial state until it reaches a new state, at which point the training is stopped and the network is reset to an initial state for the next epoch. The initial state doesn’t have to be the same for each epoch of training. Rather, what is important is for the initial state for the new epoch to be different from the state reached by the network at the end of the previous epoch;
43
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-17 2. First a single forward pass of the data through the network for the interval (n 0, n 1 ) is performed. The complete record of input data, network state (i.e. synaptic weights), and desired responses over this interval is saved; 3. A single backward pass over this past record is performed to compute the values of the local gradients: For all j A and n 0 < n n 1. This computation is performed by the formula:
44
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-18 Where ’() is the derivative of an activation function with respect to its argument, and v j (n) is the induced local field of neuron j. The use of above formula is repeated, starting from time n 1 and working back, step by step, to time n 0 ; the number of steps involved here is equal to the number of time steps contained in the epoch.
45
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-19 4. Once the computation of back-propagation has been performed back to time n 0 +1, the following adjustment is applied to the synaptic weight w ji of neuron j: Where is the learning rate parameter and x i (n-1) is the input applied to the ith synapse of neuron j at time n-1.
46
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Recurrent II Recurrent II-20 There is a potential problem with the method, which is called the Vanishing Gradients Problem, i.e. the corrections calculated for the weights are not large enough when using methods based on steepest descent. However this is a research problem currently and ones has to see the literature for details.
47
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Dynamic networks learn sequences in contrast to the static mappings of MLP and RBF networks. Time representation takes place explicitly or implicitly. The implicit form includes time-delayed versions of the input vector and use of a static network model afterwards or the use of recurrent networks. The explicit form uses a generalisation of the MLP model where a synapse is modelled now as a weight vector and not as a single number. The synapse activation is not any more the product of the synapse’s weight with the output of a previous neuron but rather the inner product of the synapse’s weight vector with the time-delayed state vector of the previous neuron.
48
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions Conclusions I The extended MLP networks with explicit temporal structure are trained with the Temporal BackPropagation algorithm. The recurrent networks include a number of simple and complex architectures. In the simpler case we train the networks using the standard BackProgation algorithm. In the more complex cases we first unfold the network in time and then train it using the BackProgation Through Time algorithm.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.