Functional Coherence in Domain Interaction Networks Prof. Ananth Grama
Dept. of Computer Science, Purdue University Outline Motivation Protein and Domain Interaction Networks Formal framework Properties for term-, set- similarity measures New Similarity Metric Results Comparison of measures Comparison of PPI, DDI networks Dept. of Computer Science, Purdue University
Dept. of Computer Science, Purdue University Motivation Extracting functional information from protein-protein interactions Noisy, incomplete, generic, static data from high throughput experiments Typical proteins are composed of multiple domains Independent unit (function, evolution, folding) Behind protein-protein interactions there are protein domains interacting physically with one another. understanding protein interactions at the domain level gives a global view of the protein interaction network and possibly of protein functions p1 d1 d2 d3 d4 p2 Domain-domain interaction Dept. of Computer Science, Purdue University
Dept. of Computer Science, Purdue University Motivation How does functional modularity manifests itself in a network of molecular interactions? Explore relationship between functional similarity and network proximity Functional annotations available for domains and proteins vastly differ Do current similarity measures work in unbiased manner? (due to incompleteness of annotation) Are they statistically meaningful and biologically interpretable? Annotation for domains is derived from proteins: as such its more general, scarce and incomplete Dept. of Computer Science, Purdue University
Dept. of Computer Science, Purdue University Formal Framework C = { ci | 0 ≤ i < N } is a finite partially ordered set of concepts (Ontology). Concepts are related by binary relationship, denoted by eg: c3 c1, c6 c3, c5 r Set of Ancestors Ai = { ck | ci ck } Two concepts (ci, cj) are comparable (~) if either ci cj or cj ci All concepts in Ai may not be comparable as the ontology is a DAG (as opposed to a tree) C3 C0 = r C2 C1 C4 C5 C6 Dept. of Computer Science, Purdue University
Properties for term-similarity Similarity (δ) of two terms based on underlying taxonomical relationship Existing measures Distance based: Count the number of edges between the nodes δE(ci,cj)=2*MAX-min[len(ci,cj)] Fails property (4) as distance is uniform over all edges symmetric. more specific terms should have at least as much self-similarity as more general terms. 3. a term should not be less similar to itself than to any other term. 4. that terms with more specific common ancestors should be more similar to each other, compared to those with less specific common ancestors. Dept. of Computer Science, Purdue University
Existing metrics for term-similarity Information Content: If Gc be set of molecules associated with concept c, then IC(c) = - log2 (|Gc|/|Gr|) δR(ci,cj)= max [ -log2 ( c ) ], c Є Ai and c Є Aj (c is common Ancestor) Normalization: δL(ci,cj)= 2 * δR(ci,cj) / (IC(ci) + IC(cj)) Hybrid approach: δJC(ci,cj)= (1 - 2 * δR(ci,cj) + IC(ci) + IC(cj))-1 All three satisfy term-similarity properties Dept. of Computer Science, Purdue University
Properties for set-similarity Let S be set of concepts, we want a measure ρ(Si, Sj) to access the semantic similarity of two sets Symmetric adding a common annotation for two molecules should not decrease the similarity between these two molecules. if new annotations are added for a new molecule, the similarity of this molecule to any other molecule should not decrease. a set of annotations should be at least as similar to itself as it is to any other set. Dept. of Computer Science, Purdue University
Existing metrics for set-similarity Average Violates properties (ii), (iii) and (iv) Maximum Weakly satisfies (ii) Average of Maximums Fails properties (ii), (iii) and (iv) Dept. of Computer Science, Purdue University
IC based set similarity Extend the notion of minimum common ancestor (λ) to sets of terms as Information content of a set is defined as: Where is set associated with all terms in MCA of Si, Sj This satisfies all 4 properties, can be extended and Dept. of Computer Science, Purdue University
Dept. of Computer Science, Purdue University Datasets Protein-Protein interactions Extract physical interactions from BioGRID database Binary data (no reliability score) Domain-Domain interactions DOMINE database Confidence score used to split dataset Struct: Only structure based interactions HC+NA : High Confidence (HC) and Structure based interactions HC+MC : High Confidence (HC) and Medium Confidence (MC) interactions Comp-2: Interactions predicted by at least two computational approaches Comp-1: Interactions predicted by at least one computational approach Dept. of Computer Science, Purdue University
Comparison of Semantic Similarity Measures Negative relation between network distance and functional similarity The proposed information content based measure (ρJC) provides the sharpest decline in semantic similarity for distance<4 For each network, we compute the distance between all pairs of molecules (proteins or domains) in the network. Then, we group molecule pairs according to their distance and compute the average semantic similarity for each group. C. elegans PPI network Dept. of Computer Science, Purdue University
Comparison of Semantic Similarity Measures Proposed metric (ρJC) provides large similarity score for larger fraction of pairs at close distances (1,2), and low similarity score for large fraction at distance>2 A comparison of the distribution of semantic similarity scores for the average information content (resnik) and self-normalized information content (rho_JC) measures is shown. Structural DDI network Dept. of Computer Science, Purdue University
Comparison of PPI and DDI Networks we compare the relationship between network proximity and functional similarity comprehensively, using several PPI and DDI networks. Relation between network proximity and semantic similarity with respect to molecular function Dept. of Computer Science, Purdue University
Comparison of PPI and DDI Networks Relation between network proximity and semantic similarity with respect to biological process Dept. of Computer Science, Purdue University
Comparison of PPI and DDI Networks Immediate and Indirect neighbors perform similar functions Functional similarity is stronger in Struct DDI network After normalization, the relationship between functional similarity and network distance is stronger in computationally inferred DDI networks than that in PPI networks network proximity in DDI networks is likely to be a better indicator of functional modularity, than that in PPI networks. DDI networks that are based on structural information are relatively more reliable than PPI networks, which may come from noisy high-throughput screening. Dept. of Computer Science, Purdue University
Dept. of Computer Science, Purdue University Summary We present necessary properties for any admissible metric for term- and set-similarity Current metrics are not admissible, develop new metric for set-similarity Proposed metric provides highly intuitive biological interpretation Comprehensive comparative analysis of PPIs and DDIs validates the role of DDIs in quantifying functional coherence Dept. of Computer Science, Purdue University