Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Similar presentations

Presentation on theme: "Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute."— Presentation transcript:

1 Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute of Statistical Science Academia Sinica 2003 International Conference on Informatics, Cybernetics, and Systems ISU, Kaohsiung, Dec. 14 2003

2 Outline  Difficulties with nonlinear SVMs for large problems  Storage and computational complexity  Reduced Support Vector Machines  Support Vector Machines for classification problems  Linear and nonlinear SVMs  Incremental Reduced Support Vector Machines  Numerical Results  Conclusions

3 Support Vector Machines (SVMs) Powerful tools for Data Mining  SVMs have a sound theoretical foundation  Base on statistical learning theory  SVMs can be generated very efficiently and have high accuracy  SVMs have an optimal defined separating surface algorithm for classification and regression  SVMs become the most promising learning  SVMs can be extend from linear to nonlinear case  By using kernel functions

4 Support Vector Machines for Classification Maximizing the Margin between Bounding Planes A+ A-

5 Support Vector Machine Formulation  Solve the quadratic program for some : min s. t. (QP) ,, denotes where or membership.  SSVM : Smooth Support Vector Machine is an efficient SVM algorithm proposed by Yuh-Jye Lee

6 Nonlinear Support Vector Machine  Extend to nonlinear cases by using kernel functions min s. t.  Nonlinear Support Vector Machine formulation:  The value of kernel function represents the inner product in the feature space  Map data from input space to a higher dimensional feature space where the data can be separated linearly

7 Difficulties with Nonlinear SVM for Large Problems  Separating surface depends on almost entire dataset  Need to store the entire dataset after solving the problem  The nonlinear kernel is fully dense  Long CPU time to compute numbers  Runs out of memory while storing kernel matrix  Computational complexity depends on  Complexity of nonlinear SSVM

8 Reduced Support Vector Machines Overcoming Computational & Storage Difficulties by Using a Rectangular Kernel  Choose a small random sample of  The small random sample is a representative sample of the entire dataset  Typically is 1% to 10% of the rows of  Replace by with corresponding in nonlinear SSVM the rectangular kernel  Only need to compute and store numbers for  Computational complexity reduces to  The nonlinear separator only depends on

9 Reduced Set plays the most important role in RSVM  It is natural to raise two questions:  Is there a way to choose the reduced set other than random selection so that RSVM will have a better performance?  Is there a mechanism that determines the size of reduced set automatically or dynamically?  Incremental reduced support vector machine is proposed to answer these questions

10 Our Observations ( Ⅰ ) is a linear combination of a set of kernel functions  If the kernel functions are very similar, the hypothesis space spanned by this kernel functions will be very limited.  The nonlinear separating surface

11 Our Observations ( Ⅱ )  Start with a very small reduced set, then add new data point only when the kernel function is dissimilar to the current function set  These points contribute the most extra information

12  The distance from the kernel vector to the column space of is greater than a threshold  The information criterion is  This distance can be determined by solving a least squares problem How to measure the dissimilar? solving least squares problems

13 Dissimilar Measurement solving least squares problems  It has a unique solution, and the distance is

14 IRSVM Algorithm pseudo-code (sequential version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For each data point not in the reduced set 4 Computes its kernel vector 5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix 7 If its distance exceed a certain threshold 8 Add this point into the reduced set and form the new reduced kernal matrix 9 Until several successive failures happened in line 7 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

15 Speed up IRSVM  Note we have to solve the least squares problem many times whose time complixity is  The main cost depends on but not on  Take advantage of this fact, we proposed a batch version of IRSVM that examines a batch points once

16 IRSVM Algorithm pseudo-code (Batch version) 1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For a batch data point not in the reduced set 4 Computes their kernel vectors 5 Computes the corresponding distances from these kernel vector 6 to the column space of the current reduced kernel matrix 7 For those points’ distance exceed a certain threshold 8 Add those point into the reduced set and form the new reduced kernal matrix 9 Until no data points in a batch were added in line 7,8 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

17 IRSVM on four public data sets

18 Conclusions  IRSVM — an advanced algorithm of RSVM  Start with extremely small reduced set and sequentially expands to include informative data points into the reduced set  Determine the size of the reduced set automatically and dynamically but no pre-specified  The reduced set generated by IRSVM will be more representative  All advantages of RSVM for dealing with large scale nonlinear classification problem are retained  Experimental tests show that IRSVM used a smaller reduced set without scarifying classification accuracy

Download ppt "Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute."

Similar presentations

Ads by Google