Download presentation
Presentation is loading. Please wait.
1
University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo Identification of Functional Modules and Hub Proteins in Protein Interaction Networks Seminar 2009
2
University at BuffaloThe State University of New York What is Bioinformatics? Bioinformatics Computer Science Molecular Biology Information Science Medical Science Pharmaceutical Science Biochemistry Biophysics Biostatistics Bioinformatics Interdisciplinary research area to manage and analyze biological data Data Techniques Applications
3
University at BuffaloThe State University of New York What is Bioinformatics? Computational Techniques Knowledge Biomedical Applications Biological Data Genome Proteome Networks Functional Characterization Disease Diagnosis Drug Development Data Mining Machine Learning Networks Data Mining Functional Characterization Bioinformatics
4
University at BuffaloThe State University of New York Overview Introduction Protein Interaction Networks and Their Structural Properties Preprocess - Network Weighting Integration of Gene Ontology using Semantic Similarity Measures Functional Module Identification Weighted Interaction Networks → → Functional Modules Hub Protein Identification Weighted Interaction Networks → → Hub Proteins Conclusion
5
University at BuffaloThe State University of New York Biological Network Definition Directed or undirected graph representation Biological molecules as nodes and biochemical reactions or biophysical interactions as edges Examples Metabolic networks Signal transduction networks Gene regulatory networks Protein interaction networks Importance Provide a global view of cellular organizations and biological processes Applicable to systematic approaches for knowledge discovery
6
University at BuffaloThe State University of New York Biological Meaning of PPI Proteins interact with each other for stability and functionality Most cellular functions are performed in a protein complex level Interaction evidence is interpreted as functional coherence / consistency Determination of PPIs Experimental methods Yeast two-hybrid systems, Mass spectrometry, Protein microarray Computational methods Homology search, Gene fusion analysis, Phylogenetic profiles Problem of PPI data Current PPI databases include a large amount of false positives / false negatives → Unreliability Protein-Protein Interaction (PPI)
7
University at BuffaloThe State University of New York Representation of Protein Interaction Networks Undirected, un-weighted graph G(V,E), a set of nodes V as proteins and a set of edges E as interactions Problem of Protein Interaction Networks Large scale Complex connectivity Protein Interaction Network
8
University at BuffaloThe State University of New York Small-world Phenomenon ( Watts & Strogatz ) Appearance of networks in the middle of regular and random networks Higher average clustering coefficient than expected by random chance Significantly small average shortest path length Scale-free Distribution ( Barabasi & Albert ) Network growth by preferential attachment Power law degree distribution – a few high degree nodes, many low degree nodes Clustering coefficient distribution independent to degree Structural Properties Protein Interaction DatabaseDIPMIPS density0.0015 average clustering coefficient0.22830.2878 average shortest path length4.144.43 degree distribution (γ)1.771.64 high modularity hub existence
9
University at BuffaloThe State University of New York Overview Introduction Protein Interaction Networks and Their Structural Properties Preprocess - Network Weighting Integration of Gene Ontology using Semantic Similarity Measures Functional Module Identification Weighted Interaction Networks → → Functional Modules Hub Protein Identification Weighted Interaction Networks → → Hub Proteins Conclusion
10
University at BuffaloThe State University of New York Motivation Unreliable protein interaction networks Transforming un-weighted graph to weighted graph by assigning the interaction reliability (or intensity) into each edge as a weight Unsupervised Approaches Using network connectivity, e.g., common neighbors, alternative paths Problem: unreliable weights Supervised Approaches Using other resources verifying interactions, e.g., gene sequence, gene expression Integrating Gene Ontology data in my works the most comprehensive well-curated Network Weighting Schemes
11
University at BuffaloThe State University of New York Structure Terms (Concepts): well-defined biological description Relationships: “is-a” / “part-of” (general-to-specific) between terms Annotation If a protein is annotated on a term, then it is also annotated on the terms on the paths towards root. Gene Ontology (GO) DAG → Transitivity cell growth & maintenance cell organization cytoplasm organization mitochondrion organization ribosome biogenesis metabolism nucleic acid metabolism RNA metabolism RNA processing transcription DNA-dependent transcription rRNA processing P1, P2, P4 P1 P2, P3 P1, P6 P5 P1, P2, P3 P2, P3 P1, P2, P3, P6 P1, P2, P3, P4 P2, P3, P5 P1, P2, P3, P6 P1, P2, P3, P5, P6 P1, P2, P3, P4P1, P2, P3, P5, P6 P1, P2, P3, P4, P5, P6
12
University at BuffaloThe State University of New York Reliability of Interacting Proteins Average (or Maximum) semantic similarity of pair-wise terms including the interacting proteins in annotations Structure-based Approaches Path length or Common parent terms Problem: all edges should represent the uniform specificity Information Content-based Approaches Information content of a term T is defined as – log(P(T)) sim xy = - log ( P i (x,y) ) where P i (x,y) is the proportion of the annotations of the term including x and y Normalized sim xy = Semantic Similarity log P i (x) + log P j (y) 2 × log P ij (x,y)
13
University at BuffaloThe State University of New York Overview Introduction Protein Interaction Networks and Their Structural Properties Preprocess - Network Weighting Integration of Gene Ontology using Semantic Similarity Measures Functional Module Identification Weighted Interaction Networks → → Functional Modules Hub Protein Identification Weighted Interaction Networks → → Hub Proteins Conclusion
14
University at BuffaloThe State University of New York Functional Module A set of molecules that participate in the same biological processes or functions Sub-network with dense intra-connections and sparse interconnection Functional Module Identification → Graph clustering problem Previous Clustering Approaches Density-based methods, e.g., maximum clique, quasi clique, clique percolation Partition-based methods, e.g., restricted neighborhood search, Markov clustering Hierarchical methods Bottom-up approaches, e.g., distance-based, common neighbors Top-down approaches, e.g., minimum cut, betweenness cut Functional Module Identification
15
University at BuffaloThe State University of New York Functional Influence Influence factors: normalized weights, inverse of degree Measurements Single-path-based method : O( |V| + |E| ) All-path-based method : NP Random-walk-based method : O( |V| 3 ) × iteration ≈ O( |V| 4 ) Functional Influence Model Improvement by an efficient algorithm
16
University at BuffaloThe State University of New York Information Flow Simulation Computation of functional influence inf s (x) of s on x ∈ V based on random walks Input: a weighted interaction network and a source node s Output: functional influence pattern of s Algorithm 1.Initialize inf s (s) 2.Compute initial flow f init (s → y) by 3.Update inf s (y) by 4.Compute flow f s (y → z) by 5.Repeat 3 and 4 until f s (y → z) is less than a threshold θ Flow Simulation
17
University at BuffaloThe State University of New York Lower-level Algorithm
18
University at BuffaloThe State University of New York Schematic View S 1.0 0.45 0.28 0.83 0.89 0.41 1.74 0.79 0.65 1.26 1.38 0.92 0.31 0.27 0.15 0.11 Pattern Clustering
19
University at BuffaloThe State University of New York Efficiency Traces only connecting nodes to calculate functional influence of a source Removes trivial flow, being less than θ, as early as possible Run Time Theoretical upper bound is unknown ( not depends on the network diameter ) Test potential factors ( # nodes, density, average degree ) with synthetic networks Time Complexity
20
University at BuffaloThe State University of New York Experiment Data: yeast protein interaction network from DIP Pattern clustering: pCluster algorithm (Wang et al., SIGMOD 2002) Evaluation Functional categories and annotations from MIPS Hyper-geometric p-value Result Accuracy
21
University at BuffaloThe State University of New York Overview Introduction Protein Interaction Networks and Their Structural Properties Preprocess - Network Weighting Integration of Gene Ontology using Semantic Similarity Measures Functional Module Identification Weighted Interaction Networks → → Functional Modules Hub Protein Identification Weighted Interaction Networks → → Hub Proteins Conclusion
22
University at BuffaloThe State University of New York Hub Protein Centrally located node in the modular structure of a protein interaction network ( a structural hub ) Functionally essential protein Previous Centrality Measurements Closeness centrality Betweenness centrality Bridging centrality Hub Protein Identification
23
University at BuffaloThe State University of New York Functional Influence Influence factors: normalized weights, inverse of degree Measurements Single-path-based method : O( |V| + |E| ) All-path-based method : NP Random-walk-based method : O( |V| 3 ) × iteration ≈ O( |V| 4 ) Functional Influence Model Improvement by a heuristic algorithm
24
University at BuffaloThe State University of New York Single-path-based path strength: All-path-based path strength: sums up the k - length path strength for all possible k uses the threshold of maximum k Path Strength
25
University at BuffaloThe State University of New York Network Conversion Input: a protein interaction network / Output: a hierarchical tree structure Algorithm Centrality (weighted closeness) of a node a: Set of ancestor nodes T(a) of a: Parent node p(a) of a: Hub Confidence Measurement Set of child nodes D(a) of a: Set of descendent nodes L a of a: Hub confidence H(a) of a: Network Conversion
26
University at BuffaloThe State University of New York Schematic View Hub Confidence How strongly a node plays a role as a structural hub Not fully depends on the hierarchical level in the tree structure
27
University at BuffaloThe State University of New York Top 10 Structural Hubs in the Yeast Protein Interaction Network Not related to their degree Each one has several different functions Structural Hubs
28
University at BuffaloThe State University of New York Biological Essentiality Evaluated by comparing with lethal proteins Lethality has been determined by protein knock-out experiments Result Lethality
29
University at BuffaloThe State University of New York Problems Complex and unreliable connectivity in protein interaction networks Contributions Reliable network generation by edge weighting Hidden knowledge discovery, e.g., patterns or taxonomy Collaboration with existing computational techniques Future Works Integration with multiple data sources Comparative analysis across organisms Conclusion
30
University at BuffaloThe State University of New York Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.