Functional Coherence in Domain Interaction Networks

Slides:



Advertisements
Similar presentations
MediaView -- Towards a “Semantic” Multimedia Database Model
Advertisements

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Permutation Tests Hal Whitehead BIOL4062/5062.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Using Semantic Similarity Measures in the Biomedical Domain for Computing Similarity between Genes based on Gene Ontology By : Elham Khabiri Adviser :
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Phylogenetic reconstruction
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Computing Trust in Social Networks
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Protein Classification A comparison of function inference techniques.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.
Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
An Optimization-Driven Approach for Modeling AS-level Internet Connectivity Presented by: Hyunseok Chang Joint work with Sugih Jamin.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
The Gene Ontology Categorizer C.A. Joslyn 1, S.M. Mniszewski 1, A. Fulmer 2 and G. Heaton 3 1 Computer and Computational Sciences, Los Alamos National.
Algorithmic Detection of Semantic Similarity WWW 2005.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Network (graph) Models
Chapter 5 Relations and Operations
Lecture 7: Constrained Conditional Models
CLUSTERING Basic Concepts
CHAPTER OUTLINE Electronegativity Polarity & Electronegativity
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Introduction to Relations
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Thesis Advisor : Prof C.V. Jawahar
Peer-to-Peer and Social Networks
Effective Social Network Quarantine with Minimal Isolation Costs
The Tree of Life From Ernst Haeckel, 1891.
I271b Quantitative Methods
Discriminative Frequent Pattern Analysis for Effective Classification
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
Spectral methods for Global Network Alignment
Multiple Sequence Alignment (I)
Volume 19, Issue 7, Pages (July 2011)
Chapter 19 Technical Metrics for Software
Anastasia Baryshnikova  Cell Systems 
Information Networks: State of the Art
Product moment correlation
Parametric Methods Berlin Chen, 2005 References:
Hierarchical Clustering
Approximate Graph Mining with Label Costs
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

Functional Coherence in Domain Interaction Networks Prof. Ananth Grama

Dept. of Computer Science, Purdue University Outline Motivation Protein and Domain Interaction Networks Formal framework Properties for term-, set- similarity measures New Similarity Metric Results Comparison of measures Comparison of PPI, DDI networks Dept. of Computer Science, Purdue University

Dept. of Computer Science, Purdue University Motivation Extracting functional information from protein-protein interactions Noisy, incomplete, generic, static data from high throughput experiments Typical proteins are composed of multiple domains Independent unit (function, evolution, folding) Behind protein-protein interactions there are protein domains interacting physically with one another. understanding protein interactions at the domain level gives a global view of the protein interaction network and possibly of protein functions p1 d1 d2 d3 d4 p2 Domain-domain interaction Dept. of Computer Science, Purdue University

Dept. of Computer Science, Purdue University Motivation How does functional modularity manifests itself in a network of molecular interactions? Explore relationship between functional similarity and network proximity Functional annotations available for domains and proteins vastly differ Do current similarity measures work in unbiased manner? (due to incompleteness of annotation) Are they statistically meaningful and biologically interpretable? Annotation for domains is derived from proteins: as such its more general, scarce and incomplete Dept. of Computer Science, Purdue University

Dept. of Computer Science, Purdue University Formal Framework C = { ci | 0 ≤ i < N } is a finite partially ordered set of concepts (Ontology). Concepts are related by binary relationship, denoted by eg: c3 c1, c6 c3, c5 r Set of Ancestors Ai = { ck | ci ck } Two concepts (ci, cj) are comparable (~) if either ci cj or cj ci All concepts in Ai may not be comparable as the ontology is a DAG (as opposed to a tree) C3 C0 = r C2 C1 C4 C5 C6 Dept. of Computer Science, Purdue University

Properties for term-similarity Similarity (δ) of two terms based on underlying taxonomical relationship Existing measures Distance based: Count the number of edges between the nodes δE(ci,cj)=2*MAX-min[len(ci,cj)] Fails property (4) as distance is uniform over all edges symmetric. more specific terms should have at least as much self-similarity as more general terms. 3. a term should not be less similar to itself than to any other term. 4. that terms with more specific common ancestors should be more similar to each other, compared to those with less specific common ancestors. Dept. of Computer Science, Purdue University

Existing metrics for term-similarity Information Content: If Gc be set of molecules associated with concept c, then IC(c) = - log2 (|Gc|/|Gr|) δR(ci,cj)= max [ -log2 ( c ) ], c Є Ai and c Є Aj (c is common Ancestor) Normalization: δL(ci,cj)= 2 * δR(ci,cj) / (IC(ci) + IC(cj)) Hybrid approach: δJC(ci,cj)= (1 - 2 * δR(ci,cj) + IC(ci) + IC(cj))-1 All three satisfy term-similarity properties Dept. of Computer Science, Purdue University

Properties for set-similarity Let S be set of concepts, we want a measure ρ(Si, Sj) to access the semantic similarity of two sets Symmetric adding a common annotation for two molecules should not decrease the similarity between these two molecules. if new annotations are added for a new molecule, the similarity of this molecule to any other molecule should not decrease. a set of annotations should be at least as similar to itself as it is to any other set. Dept. of Computer Science, Purdue University

Existing metrics for set-similarity Average Violates properties (ii), (iii) and (iv) Maximum Weakly satisfies (ii) Average of Maximums Fails properties (ii), (iii) and (iv) Dept. of Computer Science, Purdue University

IC based set similarity Extend the notion of minimum common ancestor (λ) to sets of terms as Information content of a set is defined as: Where is set associated with all terms in MCA of Si, Sj This satisfies all 4 properties, can be extended and Dept. of Computer Science, Purdue University

Dept. of Computer Science, Purdue University Datasets Protein-Protein interactions Extract physical interactions from BioGRID database Binary data (no reliability score) Domain-Domain interactions DOMINE database Confidence score used to split dataset Struct: Only structure based interactions HC+NA : High Confidence (HC) and Structure based interactions HC+MC : High Confidence (HC) and Medium Confidence (MC) interactions Comp-2: Interactions predicted by at least two computational approaches Comp-1: Interactions predicted by at least one computational approach Dept. of Computer Science, Purdue University

Comparison of Semantic Similarity Measures Negative relation between network distance and functional similarity The proposed information content based measure (ρJC) provides the sharpest decline in semantic similarity for distance<4 For each network, we compute the distance between all pairs of molecules (proteins or domains) in the network. Then, we group molecule pairs according to their distance and compute the average semantic similarity for each group. C. elegans PPI network Dept. of Computer Science, Purdue University

Comparison of Semantic Similarity Measures Proposed metric (ρJC) provides large similarity score for larger fraction of pairs at close distances (1,2), and low similarity score for large fraction at distance>2 A comparison of the distribution of semantic similarity scores for the average information content (resnik) and self-normalized information content (rho_JC) measures is shown. Structural DDI network Dept. of Computer Science, Purdue University

Comparison of PPI and DDI Networks we compare the relationship between network proximity and functional similarity comprehensively, using several PPI and DDI networks. Relation between network proximity and semantic similarity with respect to molecular function Dept. of Computer Science, Purdue University

Comparison of PPI and DDI Networks Relation between network proximity and semantic similarity with respect to biological process Dept. of Computer Science, Purdue University

Comparison of PPI and DDI Networks Immediate and Indirect neighbors perform similar functions Functional similarity is stronger in Struct DDI network After normalization, the relationship between functional similarity and network distance is stronger in computationally inferred DDI networks than that in PPI networks network proximity in DDI networks is likely to be a better indicator of functional modularity, than that in PPI networks. DDI networks that are based on structural information are relatively more reliable than PPI networks, which may come from noisy high-throughput screening. Dept. of Computer Science, Purdue University

Dept. of Computer Science, Purdue University Summary We present necessary properties for any admissible metric for term- and set-similarity Current metrics are not admissible, develop new metric for set-similarity Proposed metric provides highly intuitive biological interpretation Comprehensive comparative analysis of PPIs and DDIs validates the role of DDIs in quantifying functional coherence Dept. of Computer Science, Purdue University