Protein Structural Classification

Protein Structural Classification

Structural Classification Databases Sequence pairwise comparison
SCOP, CATH, FSSP Sequence pairwise comparison Smith-waterman, BLAST, PSI-BLAST, rank-propagation, SAM-T98 Discriminative classification SVM pairwise, mismatch kernel, EMOTIF kernel, I-Site kernel, semi-supervised kernel

SCOP Fold Superfamily Family SCOP Negative Test Set Positive
Training Set Test Set Negative Family : Sequence identity > 30% or functions and structures are very similar Superfamily : low sequence similarity but functional features suggest probable common evolutionary origin Common fold : same major secondary structures in same arrangement with the same topological connections

CATH Class Architecture Topology Homologous Sequence family

Local alignment: Smith-Waterman algorithm
For two string x and y, a local alignment with gaps is: The score is: Smith-Waterman score: Thanks to Jean Philippe

BLAST: a heuristic algorithm for matching DNA/Protein sequences
Idea: True match are likely to contain a short stretch of identity A list of ‘neighborhood words” of the query sequence Search database with the list, whenever there is a match do a ‘hit extension’, stopping at the maximum scoring extension Altschul, Madden, Schaffer, Zhang etc., 1997

PSI-BLAST: Position-specific iterated BLAST
Only extend those double hit within a certain range. A gapped alignment uses dynamic programming to extend a central pair of aligned residues in both directions. PSI-BLAST can takes PSSM as input to search database Altschul, Madden, Schaffer, Zhang etc., 1997

Local and Global Consistency
Affinity matrix D is a diagonal matrix Iterate F* is the limit of seuqnce {F(t)} Zhou, Bousquet, Lal, Weston, and Scholkopf, 2003

Weston, Elisseeff, Zhou, Leslie and Noble, 2004
Rank propagation Protein similarity network: Graph nodes: protein sequences in the database Directed edges: a exponential function of the PSI-BLAST e-value (destination node as query) Activation value at each node: the similarity to the query sequnce Exploit the structure of the protein similarity network Weston, Elisseeff, Zhou, Leslie and Noble, 2004

Karplus, Barrett and Hughey, 1999
SAM-T98 The first iteration: query sequence to search NR database using WU-BLASTP and build alignment for the found homologs 2nd-4th iterations: take the alignment from the previous iterations to find more homologs with WU-BLASTP and update the alignment with the new homologs found. Build a HMM from the final alignment. The HMM of query sequence is used to search database, or we can use query sequence to search against HMM database Karplus, Barrett and Hughey, 1999

To do it in a discriminative manner with SVM…

Jaakkola, Diekhans and Haussler, 2000
Fisher Kernel A HMM (or more than one) is built for each family Derive kernel function from the fisher scores of each sequence given a HMM H1: Jaakkola, Diekhans and Haussler, 2000

Protein Structural Classification

Similar presentations

Presentation on theme: "Protein Structural Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Structural Classification

Similar presentations

Presentation on theme: "Protein Structural Classification"— Presentation transcript:

Similar presentations

About project

Feedback