Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs.

Similar presentations


Presentation on theme: "1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs."— Presentation transcript:

1 1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs

2 2 Kernels Given a mapping Φ( ) from the space of input vectors to some higher dimensional feature space, the kernel K of two vectors x i, x j is the inner product of their images in the feature space, namely, K(x i, x j ) = Φ (x i )·Φ (x j ). Since we just need the inner product of vectors in the feature space to find the maximal margin separating hyperplane, we use the kernel in place of the mapping Φ( ). Because inner product of two vectors is a measure of the distance between the vectors, a kernel function actually defines the geometry of the feature space (lengths and angles), and implicitly provides a similarity measure for objects to be classified.

3 3 Mercer’s condition Since kernel functions play an important role, it is important to know if a kernel gives dot products (in some higher dimension space). For a kernel K(x,y), if for any g(x) such that  g(x) 2 dx is finite, we have  K(x,y)g(x)g(y) dx dy  0, then there exist a mapping  such that K(x,y) =  (x) ·  (y) Notes: 1.Mercer’s condition does not tell how to actually find . 2.Mercer’s condition may be hard to check since it must hold for every g(x).

4 4 More kernel functions some commonly used generic kernel functions

5 5 Advanced Issues –Soft margin Allow misclassification, but with penalties –Multiclass classification Indirect: combine multiple binary classifiers into a single multiclass classifier Direct: generalize binary classification methods –SVM Regression –Unsupervised learning: Support vector clustering by Ben-Hur et al (2001)

6 Soft margin 6

7 7 How to construct a kernel?

8 8

9 9

10 10

11 11 Diffusion kernels for graphs K = exp(-  L) where  is a constant, and L = D – A is the Laplacian matrix (D is the diagonal degree matrix and A is the adjacent matrix for the graph). Intuition: two nodes get closer in the feature space if there are many short paths connecting them in the graph.

12 CISC667, F05, Lec23, Liao12 Design tertiary classifiers: H for helix, E for Beta Sheet, C for coil

13 CISC667, F05, Lec23, Liao13 Nguyen & Rajapakse, Genome Informatics 14: 218-227 (2003)

14 CISC667, F05, Lec23, Liao14 A two-stage SVM

15 15 Applications in Bioinformatics W. Noble, “Support vector machine applications in computational biology”, Kernel Methods in Computational Biology. B. Schoelkopf, K. Tsuda and J.-P. Vert, ed. MIT Press, 2004.

16 16

17 17

18 18 SVM-Fisher method Protein homologs HMM Protein non- homologs Positive gradient vectors Negative gradient vectors Support vector machine Binary classification Target protein of unknown function 1 2 3 4 Positive train Negative train Testing data

19 19 HMM gradients Fisher Score =   log P(X|H,  ) The gradient of a sequence X with respect to a given model is computed using the forward- backward algorithm. Each dimension corresponds to one parameter of the model. The feature space is tailored to the sequences from which the model was trained.

20 20 Combining pairwise similarity with SVMs for protein homology detection Protein homologs Protein non- homologs Positive pairwise score vectors Negative pairwise score vectors Support vector machine Binary classification Target protein of unknown function 1 2 3 Positive trainNegative train Testing data

21 21 Experiment: known protein families Jaakkola, Diekhans and Haussler 1999

22 22 Vectorization

23 23

24 24 A measure of sensitivity and specificity ROC = 1 ROC = 0 ROC = 0.67 6 5 ROC: receiver operating characteristic score is the normalized area under a curve the plots true positives as a function of false positives

25 25 Performance Comparison (1)

26 26 Using Phylogenetic Profiles & SVMs YAL001C E-value Phylogenetic profile 0.1221 1.0640 3.5890 0.0081 0.6921 8.490 14.790 0.5841 1.5670 0.3241 0.0021 3.4560 2.1350 0.1421 0.0011 0.1121 1.2740 0.2341 4.5620 3.9340 0.4891 0.0021 2.4210 0.1121

27 27 phylogenetic profiles and Evolution Patterns 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 x Impossible to know for sure if the gene followed exactly this evolution pattern

28 28 Tree Kernel (Vert, 2002)  For a phylogenetic profile x and an evolution pattern e : P( e ) quantifies how “natural” the pattern is P( x|e ) quantifies how likely the pattern e is the “true history” of the profile x  Tree Kernel : K tree ( x,y ) = Σ e p( e )p( x|e )p( y|e )  Can be proved to be a kernel  Intuition: two profiles get closer in the feature space when they have shared common evolution patterns with high probability.

29 29 Tree-Encoded Profile (Narra & Liao, 2004) 1 1 0 1 0 0 0 1 1 0 1 0 0.33 0.67 0.34 0.5 0.75 0.55 01 0.33 0.5 0.67 0.75 0.34 0.55

30 30 Tree Encoded Polynomial kernel

31 Prediction of Protein-Protein Interactions based on co-evolution (correlated mutations) Credit: Valencia & Pazos

32 Mirror Tree (Pazos and Valencia, 2001)

33 Li Liao, 09/13/200633 A B C D E F G H ABCDEFGHABCDEFGH 0 0 0 0 0 0 0 0        0 0 0 0 () …. ()  section1  …. section2 …. Method: TreeSec

34 Li Liao, DuPont CR&D, 08/17/200734 Hybrid/Composite Kernel Two alternative views: as a hybrid that employs in tandem both an explicit mapping: phylogenetic vector  super-phylogenetic vector x   T (x) and a generic kernel K(a, b) = exp[ -  |a-b| 2 ] or as a conventional SVM but with a specially engineered kernel that is composed of the two steps above K’(x, y) = exp [-  |  T (x) -  T (y)| 2 ]

35 ClassificationData representationROC Unsupervised (Pearson CC) MirrorTree0.6587 TreeSec0.6731 Polynomial kernel SVM MirrorTree0.7067 TreeSec0.7452 TreeSec x 100.8446 TreeSec x 10 (no NJ)0.6907 TreeSec x 10 (random tree) 0.5529 Results: Dataset (Sato, et al, 2005) 13 pairs of interacting proteins from E. coli. (DIP) 41 bacterial genomes are selected from KEGG/KO (Kanehisa et al, 2004). Cross validation experiments: leave-one-out

36 Li Liao, DuPont CR&D, 08/17/200736 Linear Kernel with Adaptive Weights Kernel: K(x, y) =  (x) ·  (y) Linear kernel: K(x,y) = x  y Weighted Linear Kernel:  = ?

37 Li Liao, DuPont CR&D, 08/17/200737 Least Squares SVM (Suykens & Vandewalle, 1999)

38 Li Liao, DuPont CR&D, 08/17/200738

39 Li Liao, DuPont CR&D, 08/17/200739 Adaptive Weights Weights  ’s can be obtained, simultaneously with  ’s, from solving the above linear equations.

40 Li Liao, DuPont CR&D, 08/17/200740 Classification scheme MethodROC Unsupervised (Pearson CC) MirrorTree0.659 TreeSec0.673 Supervised (SVMs) MirrorTree0.707 TreeSec x 100.845 LS-SVM0.705 Weighted Linear Kernel LS-SVM 0.928

41 41 References and resources www.kernel-machines.org –SVMLight W. Noble, “Support vector machine applications in computational biology”, Kernel Methods in Computational Biology. B. Schoelkopf, K. Tsuda and J.-P. Vert, ed. MIT Press, 2004. Craig RA and Liao L. BMC Bioinformatics, 2007, 8:6. Craig RA and Liao L. Improving Protein-Protein Interaction Prediction based on Phylogenetic Information using Least-Squares SVM. Ann. NY Acad. Sci. (in press)


Download ppt "1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs."

Similar presentations


Ads by Google