Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting by L. J. Cao and Francis E. H. Tay IEEE Transactions On Neural Networks,

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

ECG Signal processing (2)

Neural networks Introduction Fitting neural networks

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Pattern Recognition and Machine Learning

An Introduction of Support Vector Machine

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Chapter 9 Perceptrons and their generalizations. Rosenblatt ’ s perceptron Proofs of the theorem Method of stochastic approximation and sigmoid approximation.

Separating Hyperplanes

Support Vector Machines (and Kernel Methods in general)

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Financial time series forecasting using support vector machines Author: Kyoung-jae Kim 2003 Elsevier B.V.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Chapter 5 NEURAL NETWORKS

Learning From Data Chichang Jou Tamkang University.

Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.

Sparse Kernels Methods Steve Gunn.

2806 Neural Computation Support Vector Machines Lecture Ari Visa.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

An Introduction to Support Vector Machines Martin Law.

Efficient Model Selection for Support Vector Machines

Artificial Neural Networks

Biointelligence Laboratory, Seoul National University

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Support Vector Machine & Image Classification Applications

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

An Introduction to Support Vector Machines (M. Law)

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

CS 478 – Tools for Machine Learning and Data Mining SVM.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Biointelligence Laboratory, Seoul National University

Linear Models for Classification

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

CS 9633 Machine Learning Support Vector Machines

PREDICT 422: Practical Machine Learning

Deep Feedforward Networks

Artificial Intelligence Chapter 3 Neural Networks

Identification of Wiener models using support vector regression

Artificial Intelligence Chapter 3 Neural Networks

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Prediction Networks Prediction A simple example (section 3.7.3)

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting by L. J. Cao and Francis E. H. Tay IEEE Transactions On Neural Networks, Vol. 14, No. 6, Nov 2003 Presented by Pooja Hegde CIS 525: Neural Computation Spring 2004 Instructor: Dr Vucetic

Presentation Outline Introduction  Motivation and introduction of a novel approach: SVM Background  SVMs in Regression Estimation Application of SVMs in financial forecasting  Experimental setup and results Experimental analysis of SVM parameters and results Adaptive Support Vector machines (ASVM)  Experimental setup and results Conclusions

Introduction Financial Time Series is one of the most challenging applications of modern time series forecasting. Characteristics:  Noisy- unavailability of complete information from past behavior of financial markets to fully capture dependency between future and past prices.  Non-stationary- distribution of financial time series changes over time. The learning algorithm needs to incorporate this characteristic: information given by recent data points is given more weight as compared to distant data points.

Introduction  Back-propagation Neural Networks have been successfully used for modeling financial series.  BP Neural networks are universal function approximators that can map any non-linear function without any priori assumptions about the properties of the data. They are more effective in describing dynamics of non-stationary time series due to their unique non-parametric, noise-tolerant and adaptive properties. Then what’s the problem!!  Need for large number of controlling parameters.  Difficulty in obtaining a stable solution.  Danger of overfitting: Neural network captures not just the useful information in training data but also unwanted noises, hence this leads to poor generalization.

A Novel Approach: SVMs Support Vector Machines are being used in a number of areas ranging from pattern recognition to regression estimation. Reason : Remarkable characteristics of SVMs  Good generalization performance: SVMs implement the Structural Risk Minimization Principle which seeks to minimize the upper bound of the generalization error rather than only minimize the training error.  Absence of local minima: Training SMV is equivalent to solving a linearly constrained quadratic programming problem. Hence the solution of SVMs is unique and globally optimal.  Sparse Representation of solution:In SVM, the solution to the problem only depends on a subset of training data points, called support vectors.

Background Theory of SVMs in Regression Estimation  Given a set of data points (x 1,y 1 ), (x 2,y 2 ),…,(x l,y l ) randomly and independently generated from an unknown function. SVM approximates the function using the following:  The coefficients w and b are estimated by minimizing the regularized risk function.  To estimate w and b the above equation is transformed to the primal function by introducing positive slack variables.

Background Theory of SVMs in Regression Estimation (contd..)  Introducing Lagrange multipliers and exploiting optimality constraints: decision function has following explicit form  are the Lagrange multipliers. They satisfy the equalities and they are obtained by maximizing the dual function which has the following form:

Feasibility of Applying SVM in Financial Forecasting Experimental Setup:  Data Sets-  The daily closing prices of five real futures contracts from the Chicago Mercantile Market are used as datasets.  The original closing price is transformed into a five-day relative difference in percentage of price (RDP).

Feasibility of Applying SVM in Financial Forecasting  Input variables are determined from four lagged RDP values based on 5-day periods (RDP-5, RDP-10, RDP-15, RDP-20) and one transformed closing price(EMA100). Output variable- RDP+5.  Z-score normalization is used for normalizing the time series containing outliers.  Walk-forward testing routine is used to divide whole dataset into 5 overlapping training-validation-testing sets.

Feasibility of Applying SVM in Financial Forecasting  Performance Criteria:  NMSE and MAE: measures of deviation between the actual and predicted values.  Smaller values of NMSE and MAE indicate better predictor.  DS: indication of the correctness of the predicted direction of RDP+5 given in the form of percentages.  A larger value of DS suggests a better predictor.  Gaussian Kernel is used as the kernel function of SVM.  Use the results on the validation set to choose the optimal kernel parameters (C,ε and δ 2 ) of the SVM.

Feasibility of Applying SVM in Financial Forecasting  Benchmarks  Standard 3-layer BP neural network with 5 input nodes and 1 output node.  Number of hidden nodes,learning rate & number of epochs is chosen based on the validation set.  Sigmoid transfer function-hidden nodes and Linear transfer function- output node.  Stochastic gradient descent method- train NN.  Regularized RBF Neural Network  It minimizes the risk function consisting of the empirical error and regularized term.  Regularized RBF neural network software used is developed by Muller et al. and can be downloaded from  Centers, variances and output weights are adjusted.  Number of hidden nodes and regularization parameter is chosen based on validation set.

Results  In all future contracts, largest values of NMSE & MAE are in RBF Neural Network.  In CME-SP, CBOT-US and EUREX_BUND, SVM has smaller NMSE and MAE values but BP has smaller values for DS.  The reverse is true for CBOT-BO & MATIF-CAC40  All values of NMSE are near or larger than 1.0 indicating financial datasets are very noisy.  Smallest values of NMSE & MAE occur in SVM, followed by RBF neural network.  In terms of DS, results are comparable among the 3 methods

Results  In CME-SP, CBOT-BO, EUREX-BUND, and MATIF-CAC40, smallest values of NMSE and MAE are found in SVM followed by RBF neural network.  In CBOT-US, BP has smallest NMSE & MAE followed by RBF.  Paired t-test: SVM and RBF outperform BP with  = 5% significance level for one-tailed test. No significant difference between SVM and RBF.

Experimental Analysis of Parameters C and δ 2

Results Too small a value of δ 2 causes SVM to overfit the training data while too large a value causes SVM to underfit the training data. Small value for C will underfit training data. When C is too large, SVM will overfit the training set – deterioration in generalization performance. δ 2 and C play an important role as far as the generalization performance of the SVM is concerned.

Experimental Analysis of Parameter ε NMSE on training & validation set is very stable & relatively unaffected by changes in ε. Performance of SVM is insensitive to ε. But this result cannot be generalized because effect of ε on performance depends on input dimension of dataset Number of support vectors is a decreasing function of ε. Hence a large ε reduces the number of support vectors without affecting the performance of the SVM.

Support Vector Machine with Adaptive Vectors (ASVM)  Modification of parameter C:  Regularized risk function – empirical error + regularized term  Increasing value of C increases relative importance of empirical error w.r.t regularized term.  The behaviors of the weight function can be summarized as follows:  When a  0 lim a  0 C i = C. Hence E ASVM = E SVM  When a   When a  [0,  ] and a increases, the weights for first half of training data points become smaller and those for second half of training data points become larger.

Support Vector Machine with Adaptive Vectors (ASVM)  Modification of parameter ε :  To make the solution of SVM sparser, ε adopts following form:  Proposed adaptive places ε more weights on recent training points than the distant ones.  Support vectors are a decreasing function of ε, recent training points will obtain more attention in the representation of solution that the distant points  The behaviors of the weight function can be summarized as follows:  When b  0 lim b  0 ε i = ε. Hence the weights in all training data points = 1.0  When b   When b  [0,  ] and b increases, the weights for first half of training data points become larger and those for second half of training data points become smaller.

Adaptive Vectors (ASVM) & Weighted BP Neural Network(WBP)  Regularized risk function in ASVM:  Corresponding dual function  Weighted BP Neural Network:  Weight update:

Results of ASVM  ASVM and WBP have smaller NMSE & MAE but larger DS than their corresponding standard methods.  ASVM outperforms SVM with  =2.5%  WBF outperforms BP with  =10%  ASVM outperforms WBP with  =5%  ASVM converges to fewer support vectors

Conclusions SVM: A promising alternative tool to BP neural network for financial time series forecasting. Comparable performance between regularized RBF neural network and SVM. C and δ 2 have a great influence on the performance of SVM. Number of support vectors can be reduced by using larger ε, resulting in sparse representation of solution. ASVM achieves higher generalization performance and uses fewer support vectors than standard SVM in financial forecasting. Future work: Investigate techniques to choose optimal values of the free parameters of ASVM. Explore sophisticated weight functions that closely follow dynamics of time series and further improve performance of ASVM.

THANK YOU!!!!