A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos.

Slides:



Advertisements
Similar presentations
Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Biological Networks Analysis Introduction and Dijkstras algorithm.
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
ECE Longest Path dual 1 ECE 665 Spring 2005 ECE 665 Spring 2005 Computer Algorithms with Applications to VLSI CAD Linear Programming Duality – Longest.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Introduction to Algorithms
MS&E 211 Minimum Cost Flow LP Ashish Goel. Minimum Cost Flow (MCF) Need to ship some good from “supply” nodes to “demand” nodes over a network – Example:
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Management Science 461 Lecture 2b – Shortest Paths September 16, 2008.
Lecture #1 Introduction.
1 A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis Paul Kellam 1,
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Mathematical Representation of Reconstructed Networks The Left Null space The Row and column spaces of S.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Presented by David Stavens. Autonomous Inspection Compute a path such that every point on the boundary of the workspace can be inspected from some point.
Docking of Protein Molecules
衛資所 生物資訊組 陳俊宇 April 07, 03. graph nodeedge Chromosomegenepositional correlations Pathwayenzymefunctional correlations Gene expression genecoexpressed.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
CS541 Advanced Networking 1 Routing and Shortest Path Algorithms Neil Tang 2/18/2009.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
1 Optimality in Carbon Metabolism Ron Milo Department of Plant Sciences Weizmann Institute of Science.
Proteomics Understanding Proteins in the Postgenomic Era.
The evolution and structural anatomy of small molecule metabolism pathways in Escherichia coli. Of Pathways and Proteins Stuart Rison and Sarah Teichmann.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Ch10. Intermolecular Interactions and Biological Pathways
CompuCell Software Current capabilities and Research Plan Rajiv Chaturvedi Jesús A. Izaguirre With Patrick M. Virtue.
Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.
Process Flowsheet Generation & Design Through a Group Contribution Approach Lo ï c d ’ Anterroches CAPEC Friday Morning Seminar, Spring 2005.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
All that remains is to connect the edges in the variable-setters to the appropriate clause-checkers in the way that we require. This is done by the convey.
Networks and Interactions Boo Virk v1.0.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Overview of Kinetics Rate of reaction M/sec Rate constant sec -1, M -1 sec -1 Conc. of reactant(s ) Velocity of reaction 1 st order reaction-rate depends.
“Topological Index Calculator” A JavaScript application to introduce quantitative structure-property relationships (QSPR) in undergraduate organic chemistry.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
BIOINFORMATICS ON NETWORKS Nick Sahinidis University of Illinois at Urbana-Champaign Chemical and Biomolecular Engineering.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Clustering Metabolic Networks Using Minimum Cut Trees Ryan Kellogg 1, Allison Heath 2, Lydia Kavraki 2,3 1 Carnegie Mellon University, Department of Electrical.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Management Science 461 Lecture 3 – Covering Models September 23, 2008.
Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,
Małopolska Centre of Biotechnology (MCB) X-Ray Crystallography Laboratory Looking into the deep – structural investigations of biological macromolecules.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
BT8118 – Adv. Topics in Systems Biology
St. Edward’s University
The minimum cost flow problem
Cristian Ferent and Alex Doboli
CS223 Advanced Data Structures and Algorithms
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Presentation transcript:

A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos Simeonidis 2, Janet Thornton 1,3, David Bogle 2, Lazaros Papageorgiou 2# 1 Department of Biochemistry and Molecular Biology and 2 Department of Chemical Engineering, University College London, London, WC1E 6BT, UK 3 Department of Crystallography, Birkbeck College, Malet Street, London, WC1E 7HX, UK * Corresponding author (biology): # Corresponding author (algorithm):

Outline What is pathway distance? Why calculate pathway distance? Original method Novel method - mathematical programming Application: –Genomic distance –Enzyme function

The shortest pathway distance between GltA and Mdh is 8 steps (considering directionality) or 2 steps (without directionality) Each metabolic transition represents a pathway distance unit (step) Pathway distance considers distance between metabolic enzymes Should take into account: directionality circularity The pathway distance between GapA and GltA is 7 steps This step is reversible This step is irreversible (pathway from EcoCyc: Glycolysis + TCA Pathway Distance

Reverses the “usual” pathway representation (substrates as nodes, enzymes as edges) Pathway distance is inclusive; the source enzyme has a distance of 1 step

Why calculate pathway distance? Metabolic pathways are complex networks of interaction enzymes, substrates and co- factors Relatively well characterised for certain organisms (e.g. E. coli ) Much work done on modelling metabolism but now also much interest in pathways as an indicator of “connectivity” between genes Pathway distance ( D p ) is an extension of this connectivity

Original Method Represent pathways as directed acyclic graphs Use arbitrary direction for pathways “Snip” open any cycle Perform DFT of resulting graphs Collect set of genes at distances 2,3,…,n along resulting traversals

Glycolysis + TCA (pathway from EcoCyc: Original Method Original EcoCyc pathways include: Directionality Cycles Dictate directionality: Arbitrarily set direction (top to bottom, clockwise) mdh gltA “Snip” cycles

Pathway Distance Algorithm For each metabolic pathway –For each enzyme in the pathway Find the minimal distances from the source enzyme to all other enzymes by solving linear programming problems of the type: MaximiseSummation_of_Enzyme_Distances subject to Enzyme_Connectivity_Constraints Post processing “calculations” are integrated in the algorithm (e.g. genome distance or enzyme function conservation)

For each node i * (source) Maximise  D i i subject to:D j  D i + 1,  (i,j): L ij = 1 0  D i  T,  i D i* = 1 SETS –i,j: nodes PARAMETERS –L ij :1 if there is a link from i to j, 0 otherwise –T: large number CONTINUOUS VARIABLES –D i : Distance of node i from source node ij Algorithm - objective function and constraints

i *  A Max D A +D B +D C +D D s.t. D A = 1 D A  D B +1 D B  D A +1 D C  D B +1 D C  D D +1 D D  D C +1 D D  D B +1 A B C D A B C D Algorithm - Inequalities

Key Features of Algorithm Hierarchical solution procedure Based on linear programming techniques Using an enzyme-node network representation

Advantages of Algorithm Efficiency in tackling –pathway circularity –reaction directionality Modest computational times Implementation within GAMS software system

Metabolic pathways We encoded 68 E. coli small molecule metabolism (SMM) pathways, these pathways were derived from EcoCyc This represents a set of 594 enzymes Pathway distances ranged from 2 to 15

Pathway Distance and Genome Distance Calculate minimal pathway distances for all gene pairs in each pathway For the same pairs, calculate the base pair separation of the genes encoding the enzymes in the E. coli genome (D g ) Plot percentage of gene pairs within a certain genome distance against pathway distance

Genome Distance - Conclusions Strong correlation between D p and D g Genes with small D p tend to have shorter D g Genes involved in nearby metabolic reactions are genomically clustered

Pathway Distance and Function Calculate minimal pathway distances for all gene pairs in each pathway Compare the EC numbers assigned to the genes in each pair enzyme specific 2. acts on aldehyde or oxo group 1. NAD/NADP as acceptor 1. oxidoreductase L3 cons No cons e.g. G-3-P dehydrogenase

Function - Conclusions No observable correlation between pathway distance and function (as represented by EC number) Enzymatic chemistries are varied along the conversion from one substrate to the next and aren’t performed in ‘blocks’ of similar catalysis

Conclusions - Algorithm We have an effective, correct and rapid algorithm to calculate metabolic distance The D p metric can be usefully used as a measure protein functional relation

Conclusions - Biology As expect pathway distance correlates with genome distance Pathway distance does not correlate with function as determined by EC number

Acknowledgements Sarah Teichmann, University College London Peter Karp, SRI international, Melno Park, CA Monica Riley, Alida Pellegrini- Toole, Marine Biological Laboratory, Woods Hole, MA

A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos Simeonidis 2, Janet Thornton 1,3, David Bogle 2, Lazaros Papageorgiou 2# 1 Department of Biochemistry and Molecular Biology and 2 Department of Chemical Engineering, University College London, London, WC1E 6BT, UK 3 Department of Crystallography, Birkbeck College, Malet Street, London, WC1E 7HX, UK * Corresponding author (biology): # Corresponding author (algorithm):

i *  A D A = 1 D A  D B +1 D B  D A +1 D C  D B +1 D C  D D +1 D D  D C +1 D D  D B +1 D E  D D +1 D E  D F +1 D F  D C +1 D F  D E +1 A B E C D F A B E C D F