Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Classification Using Averaged Perceptron SVM

Similar presentations

Presentation on theme: "Protein Classification Using Averaged Perceptron SVM"— Presentation transcript:

1 Protein Classification Using Averaged Perceptron SVM
CS6772 Project Presentation 12/03/2003 Protein Classification Using Averaged Perceptron SVM Eugene Ie

2 Protein Sequence Classification
Protein = ()* |  | = 20 amino acids Easy to sequence proteins, difficult to obtain structure 3D Structure Sequence VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR ? Class Globin family Globin-like superfamily Function Oxygen transport

3 Sequence Alignment vs. Classification
Sequence similarity through alignment distant homology SGFIEEDELKLFL SGFIEEEELKFVL close homology Sequence classification for remote homology Classifier

4 Structural Hierarchy of Proteins
SCOP Fold Superfamily Negative Test Set Negative Training Set Family Positive Test Set Positive Training Set Remote homologs: Structure and function conserved Sequence similarity - low

5 Remote Homology Detection
Discriminative supervised learning approach to protein classification Approach: Support Vector Machines with String Kernels C. Leslie, E. Eskin, J. Weston, and W. Noble, Mismatch String Kernels for SVM Protein Classification. C. Leslie and R. Kuang, Fast Kernels for Inexact String Matching.

6 QP SVM Training QP Solver Sequence Training Data

7 Averaged Perceptron SVM Training
Training Algorithm: Y. Freund and R. Schapire, Large Margin Classification Using the Perceptron Algorithm.

8 Averaged Perceptron SVM Training
Iterate t Epochs Sequence Training Data Run Perceptron Algorithm >VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR >TYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR Total: n sequences + n labels Generalized Bound for k Final Weight Vector, Voting Weights s = no. of dimensions in feature space k = no. of mistakes made during perceptron run SCOP experiments show: For average n ~ 1000 Average k ~ 50-60

9 Averaged Perceptron SVM Classification
Testing Algorithm: Note: Only k kernel products with unknown sequence x need to be computed. Recurrence relation: M is the set of “mistake indices”

10 Implementation Details
Built on top of protclass (Protein Classification) platform Java Platform Classification Task Hash table scan instead of Mismatch Trie Generate mismatch mappings once using shifts Dynamic kernel matrix storage Still needs debugging Speed/Space Performance ~80% reduction in space requirement ~50% reduction in training time ~50% reduction in testing time Mainly from simple online algorithm


Download ppt "Protein Classification Using Averaged Perceptron SVM"

Similar presentations

Ads by Google