University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.

University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo Identification of Functional Modules and Hub Proteins in Protein Interaction Networks Seminar 2009

University at BuffaloThe State University of New York What is Bioinformatics? Bioinformatics Computer Science Molecular Biology Information Science Medical Science Pharmaceutical Science Biochemistry Biophysics Biostatistics  Bioinformatics  Interdisciplinary research area to manage and analyze biological data Data Techniques Applications

University at BuffaloThe State University of New York What is Bioinformatics? Computational Techniques Knowledge Biomedical Applications Biological Data Genome Proteome Networks Functional Characterization Disease Diagnosis Drug Development Data Mining Machine Learning Networks Data Mining Functional Characterization Bioinformatics

University at BuffaloThe State University of New York Overview  Introduction  Protein Interaction Networks and Their Structural Properties  Preprocess - Network Weighting  Integration of Gene Ontology using Semantic Similarity Measures  Functional Module Identification  Weighted Interaction Networks → → Functional Modules  Hub Protein Identification  Weighted Interaction Networks → → Hub Proteins  Conclusion

University at BuffaloThe State University of New York Biological Network  Definition  Directed or undirected graph representation  Biological molecules as nodes and biochemical reactions or biophysical interactions as edges  Examples  Metabolic networks  Signal transduction networks  Gene regulatory networks  Protein interaction networks  Importance  Provide a global view of cellular organizations and biological processes  Applicable to systematic approaches for knowledge discovery

University at BuffaloThe State University of New York  Biological Meaning of PPI  Proteins interact with each other for stability and functionality  Most cellular functions are performed in a protein complex level  Interaction evidence is interpreted as functional coherence / consistency  Determination of PPIs  Experimental methods Yeast two-hybrid systems, Mass spectrometry, Protein microarray  Computational methods Homology search, Gene fusion analysis, Phylogenetic profiles  Problem of PPI data  Current PPI databases include a large amount of false positives / false negatives → Unreliability Protein-Protein Interaction (PPI)

University at BuffaloThe State University of New York  Representation of Protein Interaction Networks  Undirected, un-weighted graph G(V,E), a set of nodes V as proteins and a set of edges E as interactions  Problem of Protein Interaction Networks  Large scale  Complex connectivity Protein Interaction Network

University at BuffaloThe State University of New York  Small-world Phenomenon ( Watts & Strogatz )  Appearance of networks in the middle of regular and random networks  Higher average clustering coefficient than expected by random chance  Significantly small average shortest path length  Scale-free Distribution ( Barabasi & Albert )  Network growth by preferential attachment  Power law degree distribution – a few high degree nodes, many low degree nodes  Clustering coefficient distribution independent to degree Structural Properties Protein Interaction DatabaseDIPMIPS density0.0015 average clustering coefficient0.22830.2878 average shortest path length4.144.43 degree distribution (γ)1.771.64 high modularity hub existence

University at BuffaloThe State University of New York  Motivation  Unreliable protein interaction networks  Transforming un-weighted graph to weighted graph by assigning the interaction reliability (or intensity) into each edge as a weight  Unsupervised Approaches  Using network connectivity, e.g., common neighbors, alternative paths  Problem: unreliable weights  Supervised Approaches  Using other resources verifying interactions, e.g., gene sequence, gene expression  Integrating Gene Ontology data in my works the most comprehensive well-curated Network Weighting Schemes

University at BuffaloThe State University of New York  Structure  Terms (Concepts): well-defined biological description  Relationships: “is-a” / “part-of” (general-to-specific) between terms  Annotation  If a protein is annotated on a term, then it is also annotated on the terms on the paths towards root. Gene Ontology (GO) DAG → Transitivity cell growth & maintenance cell organization cytoplasm organization mitochondrion organization ribosome biogenesis metabolism nucleic acid metabolism RNA metabolism RNA processing transcription DNA-dependent transcription rRNA processing P1, P2, P4 P1 P2, P3 P1, P6 P5 P1, P2, P3 P2, P3 P1, P2, P3, P6 P1, P2, P3, P4 P2, P3, P5 P1, P2, P3, P6 P1, P2, P3, P5, P6 P1, P2, P3, P4P1, P2, P3, P5, P6 P1, P2, P3, P4, P5, P6

University at BuffaloThe State University of New York  Reliability of Interacting Proteins  Average (or Maximum) semantic similarity of pair-wise terms including the interacting proteins in annotations  Structure-based Approaches  Path length or Common parent terms  Problem: all edges should represent the uniform specificity  Information Content-based Approaches  Information content of a term T is defined as – log(P(T))  sim xy = - log ( P i (x,y) ) where P i (x,y) is the proportion of the annotations of the term including x and y  Normalized sim xy = Semantic Similarity log P i (x) + log P j (y) 2 × log P ij (x,y)

University at BuffaloThe State University of New York  Functional Module  A set of molecules that participate in the same biological processes or functions  Sub-network with dense intra-connections and sparse interconnection  Functional Module Identification → Graph clustering problem  Previous Clustering Approaches  Density-based methods, e.g., maximum clique, quasi clique, clique percolation  Partition-based methods, e.g., restricted neighborhood search, Markov clustering  Hierarchical methods Bottom-up approaches, e.g., distance-based, common neighbors Top-down approaches, e.g., minimum cut, betweenness cut Functional Module Identification

University at BuffaloThe State University of New York  Functional Influence   Influence factors: normalized weights, inverse of degree  Measurements  Single-path-based method : O( |V| + |E| )  All-path-based method : NP  Random-walk-based method : O( |V| 3 ) × iteration ≈ O( |V| 4 ) Functional Influence Model Improvement by an efficient algorithm

University at BuffaloThe State University of New York  Information Flow Simulation  Computation of functional influence inf s (x) of s on x ∈ V based on random walks  Input: a weighted interaction network and a source node s  Output: functional influence pattern of s  Algorithm 1.Initialize inf s (s) 2.Compute initial flow f init (s → y) by 3.Update inf s (y) by 4.Compute flow f s (y → z) by 5.Repeat 3 and 4 until f s (y → z) is less than a threshold θ Flow Simulation

University at BuffaloThe State University of New York Lower-level Algorithm

University at BuffaloThe State University of New York Schematic View S 1.0 0.45 0.28 0.83 0.89 0.41 1.74 0.79 0.65 1.26 1.38 0.92 0.31 0.27 0.15 0.11 Pattern Clustering

University at BuffaloThe State University of New York  Efficiency  Traces only connecting nodes to calculate functional influence of a source  Removes trivial flow, being less than θ, as early as possible  Run Time  Theoretical upper bound is unknown ( not depends on the network diameter )  Test potential factors ( # nodes, density, average degree ) with synthetic networks Time Complexity

University at BuffaloThe State University of New York  Experiment  Data: yeast protein interaction network from DIP  Pattern clustering: pCluster algorithm (Wang et al., SIGMOD 2002)  Evaluation  Functional categories and annotations from MIPS  Hyper-geometric p-value  Result Accuracy

University at BuffaloThe State University of New York  Hub Protein  Centrally located node in the modular structure of a protein interaction network ( a structural hub )  Functionally essential protein  Previous Centrality Measurements  Closeness centrality  Betweenness centrality  Bridging centrality Hub Protein Identification

University at BuffaloThe State University of New York  Functional Influence   Influence factors: normalized weights, inverse of degree  Measurements  Single-path-based method : O( |V| + |E| )  All-path-based method : NP  Random-walk-based method : O( |V| 3 ) × iteration ≈ O( |V| 4 ) Functional Influence Model Improvement by a heuristic algorithm

University at BuffaloThe State University of New York  Single-path-based path strength:  All-path-based path strength:  sums up the k - length path strength for all possible k  uses the threshold of maximum k Path Strength

University at BuffaloThe State University of New York  Network Conversion  Input: a protein interaction network / Output: a hierarchical tree structure  Algorithm  Centrality (weighted closeness) of a node a:  Set of ancestor nodes T(a) of a:  Parent node p(a) of a:  Hub Confidence Measurement  Set of child nodes D(a) of a:  Set of descendent nodes L a of a:  Hub confidence H(a) of a: Network Conversion

University at BuffaloThe State University of New York Schematic View  Hub Confidence  How strongly a node plays a role as a structural hub  Not fully depends on the hierarchical level in the tree structure

University at BuffaloThe State University of New York  Top 10 Structural Hubs in the Yeast Protein Interaction Network  Not related to their degree  Each one has several different functions Structural Hubs

University at BuffaloThe State University of New York  Biological Essentiality  Evaluated by comparing with lethal proteins  Lethality has been determined by protein knock-out experiments  Result Lethality

University at BuffaloThe State University of New York  Problems  Complex and unreliable connectivity in protein interaction networks  Contributions  Reliable network generation by edge weighting  Hidden knowledge discovery, e.g., patterns or taxonomy  Collaboration with existing computational techniques  Future Works  Integration with multiple data sources  Comparative analysis across organisms Conclusion

University at BuffaloThe State University of New York Questions?

University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.

Similar presentations

Presentation on theme: "University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.

Similar presentations

Presentation on theme: "University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo."— Presentation transcript:

Similar presentations

About project

Feedback