Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon Wong
Motivation Predicting the protein function from Protein- protein interaction data. Previous studies considers level 1 neighbors Can level-2 neighbors play an significant role in this prediction?
Summarizing the output of the study level-2 neighbors does show functional association. A significant no. Proteins were observed to be having associations with level-2 neighbors but not with level-1 neighbors. A predicting algorithm: 1) weight Level 1 & 2 neighbors based on functional similarity. 2) each function was also allotted a score based on its weighted frequency in neighbors
Conventional approaches using only direct interactions i.e level-1 neighbors Consider a radius in the interaction neighborhood network Calculate a functional distance and use clustering to make some functional classes.
Protein-Protein interactions as an undirected graph G=(V,E) (u, v) as two protein nodes And edge e between them as interaction U and v being, K-level neighbors– concept of path with k-edges between u and v. Set of neighbors-- Sk
Indirect Functional Association
Significance out of 4162 annotated proteins, only 1999 or 48% share some function with level-1 neighbors.
Sets of neighborhood pairs
Simple neighbor counting Discuss– M and N M- total predicted N-total functions known
The Algorithm 1) Functional similarity Weight Previous approaches use CD-distance between proteins u and v given by
A simple example
When a fraction ‘x’ of protein’s ‘u’s neighbors is common to protein ‘v’s neighbors then x is proportional to the probability that u’s functions are shared with v through common neighbors. (and vice versa for y protion of v ‘s neighbor common with neighbor of u)
2) integrating reliability of experimental sources: The prediction results can be improved by taking differences in reliability of sources into account. So between u and v, the reliability of the interaction is estimated as: i source no. Euv set of sources with interaction u, v n no. Of times in which interaction btween u and v was observed
So, integrated equation becomes
Transitive functional Association If u is similar to w and w is similar to v then there can be a similarity between u and v given by:
Functional Similarity Weighted Averaging the likelihood of protein p having function x: STR(u,v) Transitive FS weight r_int fraction of all the proteins who share this considered function Sigma(p,x) = 1 if p has function x else =0 Pi_x frequency of function x in proteins
Results 1) ORIGINAL NEIGHBOR COUNTING 2) Neighbor counting with FS-weight 3) scheme in (2)+ level-2 neighbors are considered.
Comparison with other schemes
Improvements? Threshold at level-2..