A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos Simeonidis 2, Janet Thornton 1,3, David Bogle 2, Lazaros Papageorgiou 2# 1 Department of Biochemistry and Molecular Biology and 2 Department of Chemical Engineering, University College London, London, WC1E 6BT, UK 3 Department of Crystallography, Birkbeck College, Malet Street, London, WC1E 7HX, UK * Corresponding author (biology): # Corresponding author (algorithm):
Outline What is pathway distance? Why calculate pathway distance? Original method Novel method - mathematical programming Application: –Genomic distance –Enzyme function
The shortest pathway distance between GltA and Mdh is 8 steps (considering directionality) or 2 steps (without directionality) Each metabolic transition represents a pathway distance unit (step) Pathway distance considers distance between metabolic enzymes Should take into account: directionality circularity The pathway distance between GapA and GltA is 7 steps This step is reversible This step is irreversible (pathway from EcoCyc: Glycolysis + TCA Pathway Distance
Reverses the “usual” pathway representation (substrates as nodes, enzymes as edges) Pathway distance is inclusive; the source enzyme has a distance of 1 step
Why calculate pathway distance? Metabolic pathways are complex networks of interaction enzymes, substrates and co- factors Relatively well characterised for certain organisms (e.g. E. coli ) Much work done on modelling metabolism but now also much interest in pathways as an indicator of “connectivity” between genes Pathway distance ( D p ) is an extension of this connectivity
Original Method Represent pathways as directed acyclic graphs Use arbitrary direction for pathways “Snip” open any cycle Perform DFT of resulting graphs Collect set of genes at distances 2,3,…,n along resulting traversals
Glycolysis + TCA (pathway from EcoCyc: Original Method Original EcoCyc pathways include: Directionality Cycles Dictate directionality: Arbitrarily set direction (top to bottom, clockwise) mdh gltA “Snip” cycles
Pathway Distance Algorithm For each metabolic pathway –For each enzyme in the pathway Find the minimal distances from the source enzyme to all other enzymes by solving linear programming problems of the type: MaximiseSummation_of_Enzyme_Distances subject to Enzyme_Connectivity_Constraints Post processing “calculations” are integrated in the algorithm (e.g. genome distance or enzyme function conservation)
For each node i * (source) Maximise D i i subject to:D j D i + 1, (i,j): L ij = 1 0 D i T, i D i* = 1 SETS –i,j: nodes PARAMETERS –L ij :1 if there is a link from i to j, 0 otherwise –T: large number CONTINUOUS VARIABLES –D i : Distance of node i from source node ij Algorithm - objective function and constraints
i * A Max D A +D B +D C +D D s.t. D A = 1 D A D B +1 D B D A +1 D C D B +1 D C D D +1 D D D C +1 D D D B +1 A B C D A B C D Algorithm - Inequalities
Key Features of Algorithm Hierarchical solution procedure Based on linear programming techniques Using an enzyme-node network representation
Advantages of Algorithm Efficiency in tackling –pathway circularity –reaction directionality Modest computational times Implementation within GAMS software system
Metabolic pathways We encoded 68 E. coli small molecule metabolism (SMM) pathways, these pathways were derived from EcoCyc This represents a set of 594 enzymes Pathway distances ranged from 2 to 15
Pathway Distance and Genome Distance Calculate minimal pathway distances for all gene pairs in each pathway For the same pairs, calculate the base pair separation of the genes encoding the enzymes in the E. coli genome (D g ) Plot percentage of gene pairs within a certain genome distance against pathway distance
Genome Distance - Conclusions Strong correlation between D p and D g Genes with small D p tend to have shorter D g Genes involved in nearby metabolic reactions are genomically clustered
Pathway Distance and Function Calculate minimal pathway distances for all gene pairs in each pathway Compare the EC numbers assigned to the genes in each pair enzyme specific 2. acts on aldehyde or oxo group 1. NAD/NADP as acceptor 1. oxidoreductase L3 cons No cons e.g. G-3-P dehydrogenase
Function - Conclusions No observable correlation between pathway distance and function (as represented by EC number) Enzymatic chemistries are varied along the conversion from one substrate to the next and aren’t performed in ‘blocks’ of similar catalysis
Conclusions - Algorithm We have an effective, correct and rapid algorithm to calculate metabolic distance The D p metric can be usefully used as a measure protein functional relation
Conclusions - Biology As expect pathway distance correlates with genome distance Pathway distance does not correlate with function as determined by EC number
Acknowledgements Sarah Teichmann, University College London Peter Karp, SRI international, Melno Park, CA Monica Riley, Alida Pellegrini- Toole, Marine Biological Laboratory, Woods Hole, MA
A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos Simeonidis 2, Janet Thornton 1,3, David Bogle 2, Lazaros Papageorgiou 2# 1 Department of Biochemistry and Molecular Biology and 2 Department of Chemical Engineering, University College London, London, WC1E 6BT, UK 3 Department of Crystallography, Birkbeck College, Malet Street, London, WC1E 7HX, UK * Corresponding author (biology): # Corresponding author (algorithm):
i * A D A = 1 D A D B +1 D B D A +1 D C D B +1 D C D D +1 D D D C +1 D D D B +1 D E D D +1 D E D F +1 D F D C +1 D F D E +1 A B E C D F A B E C D F