Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structural Classification

Similar presentations


Presentation on theme: "Protein Structural Classification"— Presentation transcript:

1 Protein Structural Classification

2 Structural Classification Databases Sequence pairwise comparison
SCOP, CATH, FSSP Sequence pairwise comparison Smith-waterman, BLAST, PSI-BLAST, rank-propagation, SAM-T98 Discriminative classification SVM pairwise, mismatch kernel, EMOTIF kernel, I-Site kernel, semi-supervised kernel

3 SCOP Fold Superfamily Family SCOP Negative Test Set Positive
Training Set Test Set Negative Family : Sequence identity > 30% or functions and structures are very similar Superfamily : low sequence similarity but functional features suggest probable common evolutionary origin Common fold : same major secondary structures in same arrangement with the same topological connections

4 CATH Class Architecture Topology Homologous Sequence family

5 Local alignment: Smith-Waterman algorithm
For two string x and y, a local alignment with gaps is: The score is: Smith-Waterman score: Thanks to Jean Philippe

6 BLAST: a heuristic algorithm for matching DNA/Protein sequences
Idea: True match are likely to contain a short stretch of identity A list of ‘neighborhood words” of the query sequence Search database with the list, whenever there is a match do a ‘hit extension’, stopping at the maximum scoring extension Altschul, Madden, Schaffer, Zhang etc., 1997

7 PSI-BLAST: Position-specific iterated BLAST
Only extend those double hit within a certain range. A gapped alignment uses dynamic programming to extend a central pair of aligned residues in both directions. PSI-BLAST can takes PSSM as input to search database Altschul, Madden, Schaffer, Zhang etc., 1997

8 Local and Global Consistency
Affinity matrix D is a diagonal matrix Iterate F* is the limit of seuqnce {F(t)} Zhou, Bousquet, Lal, Weston, and Scholkopf, 2003

9 Weston, Elisseeff, Zhou, Leslie and Noble, 2004
Rank propagation Protein similarity network: Graph nodes: protein sequences in the database Directed edges: a exponential function of the PSI-BLAST e-value (destination node as query) Activation value at each node: the similarity to the query sequnce Exploit the structure of the protein similarity network Weston, Elisseeff, Zhou, Leslie and Noble, 2004

10 Karplus, Barrett and Hughey, 1999
SAM-T98 The first iteration: query sequence to search NR database using WU-BLASTP and build alignment for the found homologs 2nd-4th iterations: take the alignment from the previous iterations to find more homologs with WU-BLASTP and update the alignment with the new homologs found. Build a HMM from the final alignment. The HMM of query sequence is used to search database, or we can use query sequence to search against HMM database Karplus, Barrett and Hughey, 1999

11 To do it in a discriminative manner with SVM…

12 Jaakkola, Diekhans and Haussler, 2000
Fisher Kernel A HMM (or more than one) is built for each family Derive kernel function from the fisher scores of each sequence given a HMM H1: Jaakkola, Diekhans and Haussler, 2000


Download ppt "Protein Structural Classification"

Similar presentations


Ads by Google