Financial Data Modelling

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Advertisements

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Financial Informatics –XVI: Supervised Backpropagation Learning
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Neural Networks Basic concepts ArchitectureOperation.
The back-propagation training algorithm
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Neural Networks Marco Loog.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions WK5 – Dynamic.
Chapter 6: Multilayer Neural Networks
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Spring 2002 Shreekanth Mandayam Robi Polikar ECE Department.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.
Biointelligence Laboratory, Seoul National University
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Classification / Regression Neural Networks 2
Artificial Intelligence Techniques Multilayer Perceptrons.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel.
Multi-Layer Perceptron
Akram Bitar and Larry Manevitz Department of Computer Science
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
Chapter 6 Neural Network.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Neural Network Architecture Session 2
Chapter 7. Classification and Prediction
The Gradient Descent Algorithm
Learning in Neural Networks
Intelligent Information System Lab
Dr. Unnikrishnan P.C. Professor, EEE
CSE P573 Applications of Artificial Intelligence Neural Networks
Classification / Regression Neural Networks 2
CS621: Artificial Intelligence
Lecture 11. MLP (III): Back-Propagation
Hidden Markov Models Part 2: Algorithms
of the Artificial Neural Networks.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Other Classification Models: Recurrent Neural Network (RNN)
Capabilities of Threshold Neurons
Backpropagation.
Machine Learning Neural Networks (2).
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.
November 1, 2010 Dr. Itamar Arel College of Engineering
Presentation transcript:

Financial Data Modelling Dr Nikolay Nikolaev Department of Computing Goldsmiths College University of London 2018

Dynamic Nonlinear Models Lecture 3 (FDM 2018) Dynamic Nonlinear Models When processing time series data the feedforward TDNN, which are static by design, accommodate the time using sliding window vectors (also called tapped delay lines). The sliding input window shifts at each discrete time step over the data series taking the next data point and removing respectively the oldest one (according to a predefined lag/dimension). In this way, using delay windows helps us to process temporal patterns. The TDNN neural network models have several drawbacks: they limit the duration of the temporal events because they do not have implicit memory, and require to determine the lag space and delay time in advance; they face difficulties in capturing long-term temporal relationships in the data; they are trained with static learning algorithms (like backpropagation, and standard optimizers). Proper handling of sequential time series data with single-layer and multilayer neural networks is accomplished out by adding memory that contains, remembers past outputs. Adding feedback connections to the neural network structure gives the model potential to capture better temporal relationships between serial data, as well as to describe better the hidden dynamics of the unknown data generator.

Lecture 3 (FDM 2018) Having feedback connections makes these recurrent neural networks dynamic systems, in other words this is the memory that makes the recurrent networks powerful tools for learning temporal dependencies in serial data. It should be noted also that having memory renders such recurrent networks especially suitable for describing non-stationarity, so they are especially useful for learning from nonstationary time series. There are two main advantages of having memory in neural network models: the memory stores the state of the dynamical neural system and determines the evolution of the output; the memory enables learning of longer time dependencies without the need to determine accurately the input size in advance, in other words it makes possible to learn with imprecise embeddings from time series data. A common learning framework can be also the maximum likelihood estimation (MLE) method, but for treating nonlinear models it is usually implemented with exact derivatives rather than using numerical integration.

Lecture 3 (FDM 2018) Dynamic NARMA Models Nonlinear versions of autoregressive moving average (NARMA) models can be developed using neural network representations. These are NARMA connectionist architectures in which we pass as inputs the latest time series measurements together with feedback past outputs. The NARMA model is defined as follows: 𝑦 𝑡 =𝑓 𝒙 𝑡 , 𝒆 𝑡 + 𝜀 𝑡 =𝑓( 𝑥 𝑡−1 , 𝑥 𝑡−2 ,…, 𝑥 𝑡−𝑝 , 𝜀 𝑡−1 , 𝜀 𝑡−2 ,…, 𝜀 𝑡−𝑞 )+ 𝜀 𝑡 where 𝜀 𝑡−1 = 𝑦 𝑡−1 −𝑓 𝒙 𝑡−1 , 𝒆 𝑡−1 are the recent prediction errors. Consider a simple recurrent single-neuron (Percepton) network having such inputs: 1 if 𝑙 =0 𝑧 𝑡−𝑙 = 𝑥 𝑡−𝑙 if 1 ≤𝑙 ≤𝑝 𝑓 𝑡−𝑙+𝑝 if (𝑝+1) ≤𝑙 ≤(𝑝+𝑞) where 𝑝 is the number of lagged inputs, and 𝑞 is the number of recurrent connections.

Lecture 3 (FDM 2018) Assuming that the output node uses the 𝑡𝑎𝑛ℎ activation function the model computes: 𝑓 𝒙 𝑡 , 𝒆 𝑡 =𝑡𝑎𝑛ℎ 𝑙=1 𝑝 𝑤 𝑙 𝑥 𝑡−𝑙 + 𝑙=𝑝+1 𝑞 𝑤 𝑙 𝑓 𝑡−𝑙 + 𝑤 0 where the temporal variables capture information from the past, and send it via the loop, thus providing memory capacity. This is what helps to capture time-varying patterns in data.

Training NARMA Networks Lecture 3 (FDM 2018) Training NARMA Networks There are two algorithms for computing dynamic gradients in such recurrent single-neuron networks: BackPropagation-Through-Time (BPTT)- this algorithm unfolds the network back in time and calculates the error derivatives backwards as an expansion; Real-Time Recurrent Learning (RTRL)- this algorithm computes the error derivatives forward in time. Having dynamic, temporal derivatives one can plug them into a standard optimizer or implement gradient-descent training with first-order or second-order methods.

Online Gradient Descent Training Lecture 3 (FDM 2018) Online Gradient Descent Training The first-order online gradient-descent training algorithm updates the weights at each particular time step in direction opposite to the instantaneous gradient of the cost function with the following equation: 𝑤 𝑗,𝑡 = 𝑤 𝑗,𝑡−1 +η 𝜕 C 𝑡 𝜕 𝑤 𝑗,𝑡−1 = 𝑤 𝑡 +η 𝜀 𝑡 𝑓 𝑡 ′ 𝜕 𝑠 𝑡 𝜕 𝑤 𝑗,𝑡−1 where 𝑓 𝑡 ′denotes the derivative of the activation function, 𝜀 𝑡 is the error 𝜀 𝑡 = 𝑦 𝑡 −𝑓 𝒙 𝑡 , 𝒘 𝑡 , and s 𝑡 is the summation at the output node s 𝑡 = 𝑙=1 𝑝 𝑤 𝑙 𝑥 𝑡−𝑙 + 𝑙=𝑝+1 𝑞 𝑤 𝑙 𝑓 𝑡−𝑙 + 𝑤 0 . This derivative is obtained according to the maximum likelihood principle starting from the instantaneous cost function: C 𝑡 is the instantaneous cost function C 𝑡 = 0.5 ( 𝑦 𝑡 −𝑓 𝒙 𝑡 , 𝒘 𝑡 ) 2 . The so called Real-Time Recurrent Learning (RTRL) derivatives are calculated using the chain rule in the following way: 𝜕 C 𝑡 𝜕 𝑤 𝑗 = 𝜕 C 𝑡 𝜕 𝑓 𝑡 𝜕 𝑓 𝑡 𝜕 𝑠 𝑡 𝜕 𝑠 𝑡 𝜕 𝑤 𝑗 = 𝜀 𝑡 (1− 𝑓 𝒙 𝑡 , 𝒘 𝑡 2 ) 𝜕 𝑠 𝑡 𝜕 𝑤 𝑗 where the derivative of the tanh activation function is 𝑓 𝑡 ′ =(1− 𝑓 𝒙 𝑡 , 𝒘 𝑡 2 ), and the time subscripts for the weights are omitted for clarity.

Temporal RTRL Derivatives Lecture 3 (FDM 2018) Temporal RTRL Derivatives The derivatives at the output node summation with respect to –input-to-hidden weights are taken as follows: 𝜕 𝑠 𝑡 𝜕 𝑤 𝑗 = 𝜕 𝑙=1 𝑝+𝑞 𝑤 𝑙 𝑧 𝑡−𝑙 + 𝑧 0 𝜕 𝑤 𝑗 = 𝑙=1 𝑝+𝑞 𝑤 𝑙 𝜕 𝑧 𝑡−𝑙 𝜕 𝑤 𝑗 + 𝑧 𝑡−𝑙 𝜕 𝑤 𝑙 𝜕 𝑤 𝑗 = = 𝑙=1 𝑝 𝑤 𝑙 𝜕 𝑥 𝑡−𝑙 𝜕 𝑤 𝑗 + 𝑙=𝑝+1 𝑞 𝑤 𝑙 𝜕 𝑓 𝑡−𝑙 𝜕 𝑤 𝑗 + 𝑧 𝑡−𝑗 = 𝑙=𝑝+1 𝑞 𝑤 𝑙 𝜕 𝑓 𝑡−𝑙 𝜕 𝑤 𝑗 + 𝑧 𝑡−𝑗 where the assumption is that 𝜕 𝑥 𝑡−𝑙 /𝜕 𝑤 𝑗 = 0. Note here that the first term accounts for the implicit effect of weight 𝑤 𝑗 on the network output, while the second term is the explicit effect of this weight on the network summation. Having knowledge about training such a recurrent single-neuron network there can be designed also recurrent multilayer Perceptrons if severe nonlinearities are present in the data (after performing initial checks with some diagnostic tests).

Example: RTRL training of a nonlinear recurrent single-layer network Lecture 3 (FDM 2018) Example: RTRL training of a nonlinear recurrent single-layer network Let a simple network having one node, one input node, a bias constant, and an output-to-input feedback connection be given. The output node has a hyperbolic tangent activation function. Suppose that the initial weights are: 𝒘 𝑡 = [− 0.0391 0.1461 0.0779], the gradients from the previous time step are: 𝜕 𝒔 𝑡 /𝜕 𝒘 𝑡 = [0.1 0.2 0.3], and the learning rate is: η=0.1. The given time series data are: external input: 𝑥 𝑡−1 = 0.9524, and target: 𝑥 𝑡 = 0.9801. Assuming that the network output generated with the previous data point is 𝑓 𝑡 = 0.5, the error is calculated as follows: 𝑒 𝑡 = 𝑥 𝑡 − 𝑓 𝑡−1 = 0.9801 − 0.5=0.4801. Next, the input vector is constructed as follows: 𝒛 𝑡 = [1.0 𝑥 𝑡−1 𝑓 𝑡−1 ] = [1.0 0.9524 0.5]. Then, we perform the forward propagation: s 𝑡 = 𝑤 1 𝑧 1 + 𝑤 2 𝑧 2 + 𝑤 3 𝑧 3 = −0.0391 ∗1+0.1461∗0.9524+0.0779∗0.5 =0.139 𝑓 𝑡 =tanh s 𝑡 =0.1381

Example: RTRL training (continuation) Lecture 3 (FDM 2018) Example: RTRL training (continuation) After that, we apply the chain rule with the corresponding values: 𝜕 𝐶 𝑡 /𝜕 𝑓 𝑡 ∗ 𝜕 𝑓 𝑡 /𝜕 𝑠 𝑡 = 𝜀 𝑡 (1− 𝑓 𝑡 2 )=0.4801∗(1− 0.5 2 ) =0.3601 Having the past derivatives, the weight deltas are computed as follows: η𝜀 𝑡 𝑓 𝑡 ′ 𝜕 𝑠 𝑡 /𝜕 𝑤 1 = 0.1∗0.3601∗ 0.1 =0.0036 η𝜀 𝑡 𝑓 𝑡 ′ 𝜕 𝑠 𝑡 /𝜕 𝑤 2 =0.1∗0.3601∗ 0.2 =0.0072 η𝜀 𝑡 𝑓 𝑡 ′ 𝜕 𝑠 𝑡 /𝜕 𝑤 3 =0.1∗0.3601∗0.3 =0.0108 Therefore, the weights are updated in the following way: 𝑤 1 = (−0.0391)+0.0036 = −0.0355 𝑤 2 = 0.1461+0.0072 = 0.1533 𝑤 3 = 0.0779+0.0108 = 0.0877

Example: RTRL training (continuation) Lecture 3 (FDM 2018) Example: RTRL training (continuation) Finally, the derivatives for the next time step are produced as follows: 𝜕 𝑠 𝑡 /𝜕 𝑤 1 = 𝑤 3 𝜕 𝑓 𝑡 /𝜕 𝑤 1 + 𝑧 1 =0.0877∗0.1 +1.0 =1.0088 𝜕 𝑠 𝑡 /𝜕 𝑤 2 = 𝑤 3 𝜕 𝑓 𝑡 /𝜕 𝑤 2 + 𝑧 2 =0.0877∗0.2+0.9524= 0.09699 𝜕 𝑠 𝑡 /𝜕 𝑤 3 = 𝑤 3 𝜕 𝑓 𝑡 /𝜕 𝑤 3 + 𝑧 3 =0.0877∗0.3+0.5 =0.5263

Exercise: Programming the RTRL algorithm in Matlab Lecture 3 (FDM 2018) Exercise: Programming the RTRL algorithm in Matlab First we need to initialize all data structures and load the time series data: NVCTS = odim; NOUTS = 1; NINPUTS = 10; m = NINPUTS+1; NNODES = 3; nrows = NNODES; NCOLS = m+NNODES; eta = 0.1; e = zeros(NNODES,1); s = zeros(NNODES,1); out = zeros(NNODES,1); yprim = zeros(NNODES,1); w = zeros(NNODES,NCOLS); delw = zeros(NNODES,NCOLS); z = zeros(NVCTS,NCOLS); d = zeros(NVCTS,NCOLS); p = zeros(NNODES,NCOLS,NNODES); pold = zeros(NNODES,NCOLS,NNODES); z(:,1) = 1.0; for i = 1:NVCTS for j = 1:NINPUTS z(i,j+1) = x(i,j); % load the input vectors end for j = 1:NOUTS d(i,j) = targets(i); % load the targets w = 0.5*(rand(NNODES,NCOLS)-0.5); % initialize the weights

Exercise: Programming the RTRL algorithm in Matlab (continuation) Lecture 3 (FDM 2018) Exercise: Programming the RTRL algorithm in Matlab (continuation) Next we develop the training loops to iterate over the data, starting with forward propagation: for epoch = 1:50 for t = 1:NVCTS % Compute the error for k = 1:NOUTS e(k) = d(t,k)-out(k); end % Set previous out(k)=out(t) as part of the next input z(t,k+m) for k = 1:NNODES z(t,k+m) = out(k); % Generate the summations at each of the k nodes s(k) = 0.0; for i = 1:NCOLS s(k) = s(k)+w(k,i)*z(t,i);

Exercise: Programming the RTRL algorithm in Matlab (continuation) Lecture 3 (FDM 2018) Exercise: Programming the RTRL algorithm in Matlab (continuation) After that, we perform the backward pass and update the weights: % Compute the output out at time (t+1): out(k) = out(t+l) = f(s(t)) for k = 1:NOUTS out(k) = s(k); end for k = 1:NNODES-NOUTS out(k+NOUTS) = tanh(s(k+NOUTS)); % Compute the weight changes at time t for i = 1:NNODES for j = 1:NCOLS delw(i,j) = 0.0; delw(i,j) = delw(i,j)+eta*e(k)*pold(i,j,k); % Update the weights for time (t+1) w = w+delw;

Exercise: Programming the RTRL algorithm in Matlab (continuation) Lecture 3 (FDM 2018) Exercise: Programming the RTRL algorithm in Matlab (continuation) Finally, the temporal matrix is computed for the next iteration: for k = 1:NOUTS yprime(k) = z(t,k); end for k = 1:NNODES-NOUTS yprime(k+NOUTS) = 1-out(k)^2; for i = 1:NNODES for j = 1:NCOLS for k = 1:NNODES kron = 0.0; if (i==k) kron = 1.0; end ssum = 0.0; for l = 1:NNODES ssum = ssum+w(k,l+m)*pold(i,j,l); % pold = p(t) p(i,j,k) = yprime(k)*(ssum+kron*z(t,j)); ptemp = pold; pold = p; p = ptemp; % pold is now p(t+1)

Lecture 3 (FDM 2018) References: R.J.Williams and D.Zipser, D. (1995). Gradient-based learning algorithms for recurrent networks and their computational complexity, In: Chauvin,Y. and Rumelhart,D.E. (Eds.), Back-propagation: Theory, Architectures and Applications, Chapter 13, Lawrence Erlbaum Publishers, Hillsdale, N.J., pp.433-486. S.Haykin (1997). Neural Networks: A Comprehensive Foundation (2nd ed.), Pearson Higher Education, Upper Saddle River,New Jersey. N.Nikolaev and H.Iba (2006). Adaptive Learning of Polynomial Networks: Genetic Programming, Backpropagation and Bayesian Methods, Springer, New York.