Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003.

Similar presentations


Presentation on theme: "A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003."— Presentation transcript:

1 A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003

2 Introduction Support Vector Machines Text classification Protein classification Various kernels Standard kernels Linear kernels, polynomial kernels, RBF kernels Other application-oriented kernels Latent semantic kernels Fisher-kernels, string kernels and etc Problem Definition Rare-class problem (unbalanced data) Noisy data problem Multi-label problem

3 Text Classification Kernels Linear kernels Latent semantic kernels Problem Focus Rare-class problem Multi-label problem Noisy data problem Dataset Reuters21578 dataset

4 Data Analysis: Reuters-21578 The corpus consists of 7769 document in training and 3019 document in testing mapped to 90 categories Rare-class problem (Unbalanced Data)

5 Data Analysis: Reuters-21578 Multi-label problem Definition: one document belongs more than one categories The averaged doc-to-category ratio is 1.271 for training set

6 Methodology and Schedule Analyze the properties of the application data and propose conjectures on the possible behaviors Projection from high-dimensional data to low-dimensional Singular Value Decomposition (SVD) Reduced-Rank Linear Discriminative Analysis (LDA) Propose hypothesis Work on synthetic datasets to testify hypothesis Generate low-dimensional synthetic data with similar properties of real data Testify hypothesis Map from synthetic data to real application data

7 Case 1: Multi-label Problem Reuters-21578 Conceptually two cases: (1) whole v.s. part Wheat v.s. Grain

8 Case 1: Multi-label Problem Synthetic Data Data generation Gaussian mixture models 200 data points in total Class 1: red Class 2: green Class 1 & 2: blue Hypothesis Linear kernel: predict everything as class 1 LSI kernel: hard to say, maybe similar as linear kernel? RBF: fit the data better than linear kernel?

9 Case 1: Multi-label Problem Results on Synthetic Data Linear kernel results: Class 1: Prec: 0.985000 Rec: 1.000000 F1: 0.992443 Class 2: Prec: 0 Rec: 0 F1: 0 Class 1& 2: Rec: 0, Prec: 0 Discussion The results on Class 1&2 depends on the proportion mp mp = # of multi-label examples/ # of training examples If mp > 0.5, then Rec = 1.00, Prec = mp If mp < 0.5, then Rec = 0, Prec = 0

10 Case 1: Multi-label Problem Results on Synthetic Data LSI kernel results: Exactly the same as linear kernel Class 1: Prec: 0.985000 Rec: 1.000000 F1: 0.992443 Class 2: Prec: 0 Rec: 0 F1: 0 Class 1& 2: Rec: 0, Prec: 0 Discussion It seems that LSI performs similarly as the linear kernel In the real application, LSI might have different behaviors

11 Case 1: Multi-label Problem Results on Synthetic Data RBF kernel results Class1: Prec: 0.985000 Rec: 1.000000 F1: 0.992443 Class 2 :Prec: 0.854167 Rec: 0.512500 F1: 0.640625 Class 1 & 2: Prec: 0.791667 Rec: 0.493506 Discussion RBF kernel fits the data very well

12 Case 1: Multi-label Problem Reuters-21578 Conceptually two cases: (2) Share concepts Wheat v.s. Soy-bean

13 Case 1: Multi-label Problem Synthetic Data Data generation Gaussian mixture models 200 data points in total Class 1: red Class 2: green Class 1 & 2: blue Hypothesis Linear kernel: might work well for this case? LSI kernel:also might work for this case? RBF: might overfit?

14 Case 1: Multi-label Problem Results on Synthetic Data Linear kernel results: Class 1: Prec: 0.918699 Rec: 0.869231 F1: 0.893281 Class 2: Prec: 0.938462 Rec: 0.938462 F1: 0.938462 Class 1& 2: Rec: 0.300000, Prec: 0.391304

15 Case 1: Multi-label Problem Results on Synthetic Data LSI kernel results: Class 1: Prec: 0.928000 Rec: 0.892308 F1: 0.909804 Class 2: Prec: 0.938462 Rec: 0.938462 F1: 0.938462 Class 1& 2: Rec: 0.366667, Prec: 0.440000

16 Case 1: Multi-label Problem Results on Synthetic Data RBF kernel results: Class 1: Prec: 0.934426 Rec: 0.876923 F1: 0.904762 Class 2: Prec: 0.938462 Rec: 0.938462 F1: 0.938462 Class 1& 2: Rec: 0.333333, Prec: 0.454545

17 Case 1: Multi-label Problem Results on Synthetic Data Discussion on results Linear kernel performs reasonably well LSI kernel gains more than linear kernel by separating the data in the right direction RBF kernel tends to fit the data

18 Case 2: Rare class problem Reuters-21578 CPU v.s. Wheat

19 Case 2: Rare class problem Synthetic Data Data generation Gaussian mixture models 103 data points in total Class 1: red Class 2: green Hypothesis Both linear kernel and LSI kernel seem to perform reasonably well RBF might overfit?

20 Case 2: Rare class problem Synthetic Data Results Results Question: where is the problem? LinearLSI RBF

21 Case 2: Rare class problem Synthetic Data Results Discussion The problem lies in the SVM classifier instead of the kernel. SVM tries to maximize the margin. Solution Set the cost-function in SVM classifier Tune threshold instead of using the default 0 Up-sampling, down-sampling, and ensemble approaches The analysis for different kernels will be difficult

22 Case 3: Noisy data problem Synthetic Data Data generation Gaussian mixture models 200 data points in total Class 1: red Class 2: green Noise data: blue Hypothesis Linear kernel tends to be robust to noise Little change for LSI kernel since the transformation is independent of the class labels RBF might overfit?

23 Case 3: Noisy data problem Synthetic Data Results Results Linear kernel and LSI kernel are robust to the noise RBF kernel tends to overfit LinearLSI RBF

24 Summary Multi-class problem Case 1: whole v.s. part Linear and LSI depends on the data distribution, but can work a lot better if we know the category hierarchy RBF seems to work better Case 2: share concepts LSI works a little bit better than linear kernel Rare class problem Problem lies in the SVM classifier, more serious on the thresholding problem Noisy data Linear kernel and LSI are robust to noise RBF might overfit

25 Next step Work on the real application dataset and testify the hypothesis Reuters-21578 A subset of RCV-1 Focus more on the multi-label problem

26 Protein Family Classification Kernel selection Fisher-kernels String kernels Problem Focus Rare-class problem Noisy data problem Dataset GPCR family classification dataset

27 Data Analysis: GPCR family classification The dataset consists of 1356 sequences by 13 classes, one sequence has one and only one label. Rare-class problem (Unbalanced Data)

28 Kernel methods revisted Fisher-kernel Build a HMM model for each family Compute the fisher scores for each parameter in the HMM Use scores as features and predict by SVM with RBF kernel String kernel K-spectrum Kernel: all possible contiguous subsequences of length k (k = 3, 4) Similar as using N-gram Mismatch string kernel An extension of string kernel that allows mismatch K = 5, 6

29 Proposed Kernel-PSA kernel Intuition The kernel defines the similarity between two sequences in the Hilbert feature space The similarity between two sequences is one of the basic problem in bioinformatics and well-studied. Proposed kernel K(x,y) is the pairwise sequence alignment scores

30 Experimental results and on-going work Experimental results Two-way cross-validation Pairwise sequence alignment using ClustalW An accuracy of 0.9550 for the GPCR family classification over 13 classes and 0.9834 over Class ABCDE SVM converges very fast On-going work Proof of semi-definite Connection between string kernel and fisher kernel Experiments on other datasets


Download ppt "A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003."

Similar presentations


Ads by Google