Download presentation
Presentation is loading. Please wait.
Published byNoel Baker Modified over 9 years ago
1
Clustering Metabolic Networks Using Minimum Cut Trees Ryan Kellogg 1, Allison Heath 2, Lydia Kavraki 2,3 1 Carnegie Mellon University, Department of Electrical & Computer Engineering, 2 Rice University, Department of Computer Science; 3 Rice University, Department of Bioengineering Problem Finding clusters in metabolic networks is important for several reasons: Clusters may correspond to groups of reactions that perform a common function Complex metabolic networks can be simplified based on their cluster composition Insights about large-scale organization and evolutionary history can be achieved [3] Our approach is interesting because: One can change the size and number of clusters produced by adjusting a single parameter The algorithm is elegant and mathematically robust Execution is efficient and based on network flow computations Motivation This project is about the discovery and analysis of clusters in metabolic networks. We implement an algorithm for cluster detection based on minimum cut trees, apply the algorithm to metabolic network data, analyze the identified clusters and discuss the biological implications. Overview Conclusion and Future Work Results The algorithm for detecting clusters is based on a structure called a minimum cut tree [2]. The minimum cut tree T of a graph G has the property that lowest edge weight along the path between two nodes in T equals the minimum cut between the same two nodes in G. Consider the following example graph and its corresponding minimum cut tree: Explanation: Suppose we are interested in the minimum cut between nodes A and F. The dashed red line indicates this cut, which has capacity 17. Consequently, in the min-cut tree, along the path between nodes A and F, the lowest edge weight is 17. Minimum Cut Trees Method We model metabolic networks using a directed, bipartite graph: One set of nodes represents compounds One set of nodes represents reactions Edges associate compounds with reactions Metabolic networks are very complex. This model is a first order approximation. It relates the topological information necessary for cluster identification. Metabolic Networks as Graphs The minimum cut tree clustering (MCTC) algorithm proceeds as follows [1]: Clustering Algorithm Tuning Alpha Begin with an undirected, weighted graph G. Attach artificial sink to each node in G with edge of weight α. Call this structure “expanded graph”. Compute the minimum cut tree of the expanded graph. Now, remove the artificial sink from the structure. The disconnected components are clusters of G. We obtain optimal clusterings for each of the four organisms and compare with known metabolic pathways. Matches fall roughly into four categories: Full match: A cluster coincides exactly with a pathway. Partial match: A cluster is contained by but does not fill a pathway. Multi-match: A single cluster spans multiple pathways. No match: There is little discernable clustering in a pathway. We present an example of each type: Biological Analysis This is a ongoing project. More analysis is necessary to determine the extent that the MCTC algorithm is useful for understanding metabolic networks. Current progress is encouraging; the algorithm seems to produce biologically meaningful clusters with reasonable efficiency. Future work we will explore: cluster detection when pathway structure is unknown, simplified network representations based on cluster composition, and applications in other types of biological networks, such as motif identification in regulatory networks. References [1] G.W. Flake, R.E. Tarjan, K. Tsioutsiouliklis. “Graph Clustering and Minimum Cut Trees.” Internet Mathematics;1: 385-408. 2002. [2] R.E. Gomory and T.C. Hu. “Multi-terminal Network Flows.” J. Soc. Indust. Appl. Math; 9: 551-571.1961. [3] P. Holme and M. Huss. “Discovery and Analysis of Biochemical Network Hierarchies”. Bioinformatics; 19: 532- 538. 2003. For questions or comments: Allison Heath aheath@cs.rice.edu We seek to objectify selection of alpha in our analysis: Choose the value corresponding to clusters that “best fit” known metabolic pathway structure To calculate, find intersection of average pathways per cluster (PPC) and average clusters per pathway (CPP) Figure to right shows best fit alpha values for the four organisms in our study Cluster Statistics Interesting observations: Number of clusters changes with α in step-like fashion Moderate sized clusters for only small range of α Overall behavior is as expected Full Match: E. coli Fatty Acid Biosynthesis No Match: A. thaliana Reductive carboxylate cycle Partial Match S. cerevisiae Nucleotide sugars metabolism Multi Match H. sapiens Methionine metabolism / Cysteine metabolism Our data comes from the Kyoto Encyclopedia for Genes and Genomes (KEGG). We study the full metabolism of four organisms: Saccharomyces cerevisiae Arabidopsis thaliana Escherichia coli Homo sapiens TotalLCC EdgesNodesEdgesNodes S. Cerevisiae2361241511741055 A. thaliana2670281812671151 E. coli3109315216541470 H. sapiens3529350717851575 We note the KEGG data is disconnected. We focus on the primary, largest connected component (LCC) in the metabolic network. Data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.