Download presentation
Presentation is loading. Please wait.
1
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications
2
CISC667, F05, Lec23, Liao2
3
3
4
4
5
5 Combining pairwise similarity with SVMs for protein homology detection Protein homologs Protein non- homologs Positive pairwise score vectors Negative pairwise score vectors Support vector machine Binary classification Target protein of unknown function 1 2 3 Positive trainNegative train Testing data
6
CISC667, F05, Lec23, Liao6 Experiment: known protein families Jaakkola, Diekhans and Haussler 1999
7
CISC667, F05, Lec23, Liao7 Vectorization
8
CISC667, F05, Lec23, Liao8
9
9 A measure of sensitivity and specificity ROC = 1 ROC = 0 ROC = 0.67 6 5 ROC: receiver operating characteristic score is the normalized area under a curve the plots true positives as a function of false positives
10
CISC667, F05, Lec23, Liao10 Performance Comparison (1)
11
CISC667, F05, Lec23, Liao11
12
CISC667, F05, Lec23, Liao12 Using Phylogenetic Profiles & SVMs YAL001C E-value Phylogenetic profile 0.1221 1.0640 3.5890 0.0081 0.6921 8.490 14.790 0.5841 1.5670 0.3241 0.0021 3.4560 2.1350 0.1421 0.0011 0.1121 1.2740 0.2341 4.5620 3.9340 0.4891 0.0021 2.4210 0.1121
13
CISC667, F05, Lec23, Liao13 phylogenetic profiles and Evolution Patterns 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 x Impossible to know for sure if the gene followed exactly this evolution pattern
14
CISC667, F05, Lec23, Liao14 Tree Kernel (Vert, 2002) For a phylogenetic profile x and an evolution pattern e : P( e ) quantifies how “natural” the pattern is P( x|e ) quantifies how likely the pattern e is the “true history” of the profile x Tree Kernel : K tree ( x,y ) = Σ e p( e )p( x|e )p( y|e ) Can be proved to be a kernel Intuition: two profiles get closer in the feature space when they have shared common evolution patterns with high probability.
15
CISC667, F05, Lec23, Liao15 1 1 0 1 0 0 0 1 1 1 0.33 0.67 0.34 0.5 0.75 0.55 1 0.33 0.67 0.34 0.5 0.75 0.55 Post-order traversal Tree-Encoded Profile (Narra & Liao, 2004)
16
CISC667, F05, Lec23, Liao16
17
CISC667, F05, Lec23, Liao17 Using Support Vector Machines
18
CISC667, F05, Lec23, Liao18 Kernel function: where r = 0.10 Soft margin regularization C = 1.50 Coding scheme: BIN21 L( ) = i ½ i j y i y j (K(x i · x j ) + ij /C) Evaluation: Q3 = (P1+P2+P3)/N C = (TP TN - FP FN) / ( PP PN AP AN) SOV: segment overlap accuracy
19
CISC667, F05, Lec23, Liao19 Design tertiary classifiers
20
CISC667, F05, Lec23, Liao20
21
CISC667, F05, Lec23, Liao21 Nguyen & Rajapakse, Genome Informatics 14: 218-227 (2003)
22
CISC667, F05, Lec23, Liao22 A two-stage SVM
23
CISC667, F05, Lec23, Liao23
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.