Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

Similar presentations


Presentation on theme: "CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications."— Presentation transcript:

1 CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications

2 CISC667, F05, Lec23, Liao2

3 3

4 4

5 5 Combining pairwise similarity with SVMs for protein homology detection Protein homologs Protein non- homologs Positive pairwise score vectors Negative pairwise score vectors Support vector machine Binary classification Target protein of unknown function 1 2 3 Positive trainNegative train Testing data

6 CISC667, F05, Lec23, Liao6 Experiment: known protein families Jaakkola, Diekhans and Haussler 1999

7 CISC667, F05, Lec23, Liao7 Vectorization

8 CISC667, F05, Lec23, Liao8

9 9 A measure of sensitivity and specificity ROC = 1 ROC = 0 ROC = 0.67 6 5 ROC: receiver operating characteristic score is the normalized area under a curve the plots true positives as a function of false positives

10 CISC667, F05, Lec23, Liao10 Performance Comparison (1)

11 CISC667, F05, Lec23, Liao11

12 CISC667, F05, Lec23, Liao12 Using Phylogenetic Profiles & SVMs YAL001C E-value Phylogenetic profile 0.1221 1.0640 3.5890 0.0081 0.6921 8.490 14.790 0.5841 1.5670 0.3241 0.0021 3.4560 2.1350 0.1421 0.0011 0.1121 1.2740 0.2341 4.5620 3.9340 0.4891 0.0021 2.4210 0.1121

13 CISC667, F05, Lec23, Liao13 phylogenetic profiles and Evolution Patterns 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 x Impossible to know for sure if the gene followed exactly this evolution pattern

14 CISC667, F05, Lec23, Liao14 Tree Kernel (Vert, 2002)  For a phylogenetic profile x and an evolution pattern e : P( e ) quantifies how “natural” the pattern is P( x|e ) quantifies how likely the pattern e is the “true history” of the profile x  Tree Kernel : K tree ( x,y ) = Σ e p( e )p( x|e )p( y|e )  Can be proved to be a kernel  Intuition: two profiles get closer in the feature space when they have shared common evolution patterns with high probability.

15 CISC667, F05, Lec23, Liao15 1 1 0 1 0 0 0 1 1 1 0.33 0.67 0.34 0.5 0.75 0.55 1 0.33 0.67 0.34 0.5 0.75 0.55 Post-order traversal Tree-Encoded Profile (Narra & Liao, 2004)

16 CISC667, F05, Lec23, Liao16

17 CISC667, F05, Lec23, Liao17 Using Support Vector Machines

18 CISC667, F05, Lec23, Liao18 Kernel function: where r = 0.10 Soft margin regularization C = 1.50 Coding scheme: BIN21 L(  ) =   i  ½   i  j y i y j (K(x i · x j ) +  ij /C) Evaluation: Q3 = (P1+P2+P3)/N C = (TP  TN - FP  FN) /  ( PP  PN  AP  AN) SOV: segment overlap accuracy

19 CISC667, F05, Lec23, Liao19 Design tertiary classifiers

20 CISC667, F05, Lec23, Liao20

21 CISC667, F05, Lec23, Liao21 Nguyen & Rajapakse, Genome Informatics 14: 218-227 (2003)

22 CISC667, F05, Lec23, Liao22 A two-stage SVM

23 CISC667, F05, Lec23, Liao23


Download ppt "CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications."

Similar presentations


Ads by Google