Download presentation
Presentation is loading. Please wait.
1
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network
2
Outline Introduction; Background; Method; Experiment Analysis
3
Introduction Pairwise subtle sequence similarities imply structural functional and evolutionary relations among DNA and protein seqences; Search biosequences from online database is analogous to searching the WWW (search engine search the db for query and return a ranked list); A protein ranking algorithm is presented for biosequence query;
4
Early algorithms only focus on pair-wise sequence similarity (SW LA search); Statistical models use multiple alignments for similarity search (profile based, psi-blast); Global similarity search can be mapped onto protein similarity network. Background
5
How to perform protein ranking? Underlying idea: Google ranking Key feature: Exploiting global structure by interring it from local hyperlink structure. Construct a protein similarity network Add query sequence Weight diffusion Rank proteins upon convergence
6
Algorithm
7
Experiment Use protein 3-D structure database SCOP as golden standard. Sequences have no more than 95% similarity. 7329 proteins are splitted into 379 superfamilies as training and 332 for testing 3 networks are generated using BLAST and PSI-BLAST.
8
Experiment Value Compare with other two experiments: 1. only local structure are considered 2. non-local edges without weak edges The result shows that the second one is only slightly worse than our algorithm = Where Sj(i) is E value assigned to protein I given query j.
9
Analysis Bower et al, Science vol 306, 2004 Cluster structure
10
Author: Kuang Rui et., al Bioinformatics Presented by Tie Wang Motif based protein ranking by network propagation
11
Outline Introduction; Background; Method; Experiment Analysis
12
Direct measure of pairwise sequence is proved to be effective on classification. Performance is dropped down when detecting subtle remotely homology sequences. Those sequences share a conserved structure at least at some components. Formulate problem based on this statement. Background
13
Protein motif bipartite network Each protein contains a set of motifs. Each motif belongs to a set of proteins. Their relationship are mapped to a Bipartite graph as shown on the left. The edge weight indicates the probi- lity that motif x is in protein y.
14
Motifdrop Algorithm Set P represents protein sequences and set F represents motifs. H is the connectivity matrix. is row normalized version of H. is a vector of initial value for H. is a vector of initial value for P.
15
MotifProp Algorithm The convergence of motifdrop is guranteed. The problem is reformulated based on the following rule, is row normalized version of H. is a vector of initial value for H. is a vector of initial value for P.
16
Edge weighting scheme PSI-BLAST E-value is assigned between pair-wise protein nodes. Gaussian edge weights are calculated. The Gaussian weights from query to each protein are assigned as initial value.
17
Value estimation Sq(i) is the E-value of protein i and query q. Eq(j) is the E-value of the jth motif and ith protein. (1) ???
18
Estimation on substitution score Substitutions score between a kmer f and sequence x can be estimated as, where and s l is a log value which implied the S score below threshold can be a motif hits against sequence x.
19
Sequential MotifProp Empirical experiments suggest that using a weighted linear combination of multiple motifs does not improve the results. Apply a simple multiple motif sets scheme. Motif nodes F can be divided into n set partition in which F(i) is a set of motif from ith motif set. F set represents the motifs instead of individual ones.
20
Motif-rich regions
21
Experiments 7329 protein domains with known 3D structure on SCOP. They are divided into training (4246) and testing (3083). Apply additional 10602 from swiss-prot db. Evaluation on ROC curve.
22
Results of classification
23
Results of classification (cont)
24
Results on Motif rich region
25
Conclusion Two methods are presented on protein classification using protein ranking methods. Similarity matrix and protein/motif propagation network are base structures. Simple methods but innovative formulation. Better results compared with current approaches. Analysis on results play an important roles.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.