Download presentation
Presentation is loading. Please wait.
Published byHomer Jefferson Modified over 9 years ago
1
Inferring Functional Information from Domain co-evolution Yohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama and Shankar Subramaniam Gaurav Chadha Deepak Desore
2
Layout Motivation Computational Methods and Algorithms Results Conclusion Questions
3
Motivation (1 of 2..) Prior Work Focused on understanding Protein function at the level of entire protein sequences Assumption: Complete Sequence follows single evolutionary trajectory It is well known that a domain can exist in various contexts, which invalidates the above assumption for multi-domain protein sequences
4
Motivation (2 of 2..) Our approach Improvement of Multiple Profile method Constructs Co-evolutionary Matrix to assign phylogenetic similarity scores to each protein pair Identifies Co-evolving regions using residue- level conservation
5
Computational Methods & Algorithms Constructing phylogenetic profiles Protein(single) phylogenetic profiles Segment(Multiple) phylogenetic profiles Residue phylogenetic profiles Computing Co-evolutionary matrices Deriving phylogenetic similarity scores
6
Protein phylogenetic profiles Phylogenetic profile is a vector which tells about the existence of a protein in a genome. Let P = {P 1,P 2,…,P n } be the set of proteins and, G = {G 1,G 2,…,G m } be the set of Genomes Every row represents binary phylogenetic profile of a protein.
7
Protein phylogenetic profiles(contd.) Single phylogenetic profile ψ i for protein P i is, ψ i (j) = - 1,1 <= j <= m log(E ij ) where E ij is minimum BLAST E-value of local alignment between P i and G j Advantage: gives degree of sequence divergence
8
Protein phylogenetic profiles(contd.) Mutual Information I(X,Y) defined as, I(X,Y) = H(X) + H(Y) – H(X,Y), where H(X), Shannon Entropy of X is defined as, H(X) = ∑ p x * log(p x ), x Є X andp x = P[X = x] Phylogenetic similarity between ψ i (j) and ψ i (j) is, μ s (P i,P j ) = I(ψ i, ψ i )
9
Segment phylogenetic profiles Single profile based methods could miss significant interactions. Domain D 1 2 of P 2 follows evolutionary trajectory similar to P 1 and P 3 which single profile method didn’t capture.
10
Segment phylogen. profiles(contd.) Dividing each protein P i into fixed size segments S 1 i,S 2 i,…,S k i Phylogenetic similarity between two proteins, μ M (P i,P j ) = max I(ψ s i, ψ t j ), s,t where ψ s i is phylogenetic profile of segment S k i of protein P i
11
Residue phylogenetic profiles Problem with multiple phylogenetic profiles: Both domains covered together by the segment S 2 2, overriding their individual phylogenetic profiles. Significant local alignment between two proteins corresponds to the residues covered in the alignment rather than the whole sequences.
12
Residue phylog. profiles(contd.) A(P i,G j ) – set of significant local alignments between Protein P i and Genome G j T(A) = [r b,r e ] – interval of residues on P i corresponding to each alignment A Є A(P i,G j ) For each residue r on P i phylogenetic profile is ψ r i (j) = min - 1,1 <= j <= m A Є A r log(E(A)) A r = {A Є A(P i,G j ): r Є T(A)} is the set of local alignments that contain r
13
Computing co-evolutionary matrices For each protein pair P i and P j with lengths l i and l j, co-evolutionary matrix entry M ij (r,s) is, M ij (r,s) = I (ψ r i, ψ s j ), where1 <= r <= l i and 1 <= s <= l j The Co-evolutionary Matrix contains Information about which regions of the two proteins co- evolved The co-evolved domain(s) appear as a block of high mutual information scores in the matrix
14
Deriving phylogenetic similarity scores Phylogenetic similarity scores between two proteins P i and P j is, μ C (P i,P j ) = max minM ij (a,b) 1<= r <= l i r <= a <= r + W 1<= s <= l j s <= a <= s + W where W is the window parameter that quantifies the minimum size of the region on a protein to be considered as a conserved domain.
15
Results Implemented and tested on 4311 E.coli proteins 152 Genomes(131 Bacteria,17 Archaea,4 Eukaryota) Value of f (down-sampling factor) = 30, W = 2 These values translate in overlapping segments of 60 residue long Excluded homologous proteins from analysis Define p-value as fraction of non-homologous protein pairs (N)
16
Results (contd.) MIS – Mutual Information Score PP – No. of predicted protein pairs PPV = TP / (TP + FP) For all μ*, coverage = TP + FP TN and FN are the no. of protein pairs that do not meet the threshold
17
Results (contd.) Co-evolutionary matrix has 1.5 times greater coverage at PPV = 0.7 than the single profile method At same no. of PP, Co-evolutionary matrix has better PPV and sensitivity values than single profile method
18
Results (contd.) Mutual Information score distribution for interacting and non-interacting protein pairs At 0 MIS, SP shows a peak while CM doesn’t. In other ways, at low MIS scores, SP scores over CM
19
Results (contd.) Shows p-values of Single Profile method v/s Co-evolutionary Matrix method Scattered circles show that the two methods can predict very differently
20
Results (contd.) – Phosphotransferase system Domain IIA(residues 1-170) and domain IIB(residue 170-320) Darker region shows that the domains have co-evolved. So we can conclude that IIB evolved with IIC rather than IIA Top-20 predicted interacting partners of protein IIAB for both methods
21
Results (contd.) - Chemotaxis N-terminus of CheA(residues 1-200) and C-terminus of CheA(residues 540-670) co-evolved with C- terminus region of CheB (residues 170-340) Top-20 predicted interacting partners of protein CheA using both methods
22
Results (contd.) – Kdp System N-terminal domain of KdpD (residues 1-395) co-evolved with KdpC Top-10 predicted interacting partners of protein KdpD using both methods
23
Conclusion Results in this paper strongly suggest that co- evolution of proteins should be captured at the domain level Because domains with conflicting evolutionary histories can co-exist in a single protein sequence Regions that are important for supporting both functional and physical interactions between proteins can be detected
24
Questions Thank You !!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.