Download presentation
Presentation is loading. Please wait.
Published byJonas Todd Modified over 9 years ago
1
Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International gobbel@ai.sri.com http://BioCyc.org/
2
SRI International Bioinformatics Computing with Pathway Tools: APIs Generic functions with a consistent naming scheme l Basic frame access functions l Built-in functions for analysis and global statistics Simultaneous access to multiple KBs l Cross-species comparisons l Specialized KBs u MetaCyc u SchemaBase
3
SRI International Bioinformatics Computing with Pathway Tools: APIs PerlCyc interface l Library of Perl functions for querying PGDBs via socket connection l Database access functions u Select_Organism, All_Pathways l Functions for performing inference / hardwired queries u Genes_Of_Reaction, Genes_Of_Pathway u Transcription_Unit_Transcription_Factors u Enzyme_P JavaCyc interface also in progress http://aracyc.stanford.edu/~mueller/perlcyc/ Lisp API l http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
4
SRI International Bioinformatics Perlcyc and Javacyc Interface to running Pathway Tools image through TCP Names are translated to Perl and Java conventions Object references are supported by means of unique frame names
5
SRI International Bioinformatics Pathway Tools API Functions get_class_all_instances(Class) l Returns the instances of Class Key Pathway Tools classes: l Genetic-Elements l Genes l Proteins u Polypeptides u Protein-Complexes l Pathways l Reactions l Compounds-And-Elements l Enzymatic-Reactions l Transcription-Units l Promoters l DNA-Binding-Sites
6
SRI International Bioinformatics Pathway Tools API Functions Notation Frame.Slot means a specified slot of a specified frame get_slot_value(Frame Slot) l Returns first value of Frame.Slot get_slot_values(Frame Slot) l Returns all values of Frame.Slot slot_has_value_p(Frame Slot) l Returns true if Frame.Slot has at least one value member_slot_value_p(Frame Slot Value) l Returns true if Value is one of the values of Frame.Slot
7
SRI International Bioinformatics Additional Pathway Tools Functions – Semantic Inference Layer Built-in functions encode commonly used queries that compute indirect DB relationships l genes_of_pathway, substrates_of_pathway l all_transcription_factors, regulon_of_protein See http://bioinformatics.ai.sri.com/ptools/ptools- fns.html for more informationhttp://bioinformatics.ai.sri.com/ptools/ptools- fns.html
8
SRI International Bioinformatics Computing with Pathway Tools: Flat Files Two file formats: tab-delimited, attribute-value One file for each format, each datatype Specification: l http://bioinformatics.ai.sri.com/ptools/flatfile-format.html http://bioinformatics.ai.sri.com/ptools/flatfile-format.html Examples: l Pathways.col – Pathways and genes encoding enzymes l Enzymes.col – Enzymes and reactions they catalyze l Pathways.dat – Full data on each pathway l Reactions.dat – Full data on each reaction
9
SRI International Bioinformatics Example Flat File UNIQUE-ID - P107-PWY TYPES - Energy-Metabolism COMMON-NAME - RuMP cycle and formaldehyde assimilation REACTION-LIST - FORMATEDEHYDROG-RXN REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN REACTION-LIST - 6PGLUCONDEHYDROG-RXN REACTION-LIST - R84-RXN REACTION-LIST - PGLUCISOM-RXN REACTION-LIST - R12-RXN REACTION-LIST - R10-RXN SYNONYMS - ribulose-monophosphate cycle SYNONYMS - formaldehyde oxidation //
10
SRI International Bioinformatics Example Flat File – Reactions.dat UNIQUE-ID - R84-RXN TYPES - EC-1.1.1 EC-NUMBER - 1.1.1.- IN-PATHWAY - P122-PWY IN-PATHWAY - P107-PWY LEFT - GLC-6-P LEFT - NAD OFFICIAL-EC? - NO RIGHT - 6-P-GLUCONATE RIGHT - NADH RIGHT - PROTON //
11
SRI International Bioinformatics Example Flat File – Compounds.dat UNIQUE-ID - GLC-6-P TYPES - Carbohydrate-Derivatives COMMON-NAME - glucose-6-phosphate CAS-REGISTRY-NUMBERS - 56-73-5 CHEMICAL-FORMULA - (C 6) CHEMICAL-FORMULA - (H 13) CHEMICAL-FORMULA - (O 9) CHEMICAL-FORMULA - (P 1) MOLECULAR-WEIGHT - 260.137 SYNONYMS - D-glucose-6-P SYNONYMS - glucose-6-P SYNONYMS - α-D-glucose-6-phosphate SYNONYMS - α-D-glucose-6-P SYNONYMS - D-glucose-6-phosphate //
12
SRI International Bioinformatics Bioinformatics Results: Algorithms Query and visualization environment for genome and pathway information PathoLogic algorithm predicts the metabolic network of an organism from its genome Algorithm for global characterization of a metabolic network Algorithms under development for qualitative modeling of the cell
13
SRI International Bioinformatics The Pathway Tools KB as a "virtual cell" Detailed representation of proteins, including subunits Protein complexes and modifications Links from genome, through proteins, to pathways and superpathways
14
SRI International Bioinformatics Computing with the Metabolic Network Comparative analysis of metabolic networks Visualization of expression data Correlation of metabolism and transport Connectivity analysis of metabolic network Forward propagation of metabolites Verification of known growth media with metabolic network
15
SRI International Bioinformatics Computational Exploration of PGDBs Infer metabolic network from genome l Bioinformatics 18:705 2002 Global properties of the metabolic network l Genome Research 10:568 2000 Global properties of the genetic network Comparison of whole metabolic networks Consistency of a PGDB with respect to known growth-media requirements Search for gaps in metabolic network l Pacific Symp Biocomputing 2001:471
16
SRI International Bioinformatics Example Studies Relationship of protein subunits to gene positions Global properties of the E. coli metabolic network l Reactions catalyzed by more than one enzyme l Enzymes that catalyze more than one reaction l Reactions participating in more than one pathway u Automatic detection of intersection points in the metabolic network Nutrient analyses l Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network? l Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds? Operon prediction
17
SRI International Bioinformatics Protein subunits and linked genes Question: are protein subunits coded by neighboring genes? l Proteins are linked to genes, gene positions are recorded in the KB Procedure l Fetch all protein complexes l Subunits are stored in the ‘components’ slot l Each component has a ‘gene’ slot l Genes have ‘left-end-position’ and ‘right-end-position’ slots Results l Protein subunits of >90% of heteromeric enzymes are encoded by neighboring genes
18
SRI International Bioinformatics Global properties: How many reactions are catalyzed by more than one enzyme? Procedure l get_class_all_instances(‘Reactions’) l We are interested only in reactions with at least one value in their ‘enzymatic-reaction’ slot l result = reactions with more than one value for their ‘enzymatic-reaction’ slot Results l About 10% of reactions are catalyzed by more than one enzyme l Two classes of multi-enzyme reactions u Homologous enzymes u “Easy” reactions
19
SRI International Bioinformatics Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?) Procedure l get_class_all_instances(‘Proteins’) l result = proteins with more than one value in the ‘catalyzes’ slot Results l 100 out of 607 enzymes catalyze multiple reactions l This is significantly more than predicted by genome sequencing projects
20
SRI International Bioinformatics Global properties: Reactions in multiple pathways Procedure l get_class_all_instances(‘Reactions’) l result = reactions with more than one value in the ‘in- pathway’ slot Significance l Reactions that appear in multiple pathways correspond to intersection points in the metabolic network u Could be used to identify candidate reactions for drug targets
21
SRI International Bioinformatics Metabolic Overview Queries Species comparison l Highlight reactions that are u Shared/not-shared with u Any-one/All-of u A specified set of species Overlay expression data l Absolute or relative expression levels l Reaction colors reflects expression level
22
SRI International Bioinformatics A E
23
SRI International Bioinformatics
26
SRI International Bioinformatics C. crescentus Cell Cycle Gene Expression
27
SRI International Bioinformatics Global Consistency Checking of Biochemical Network Given: l A PGDB for an organism l A set of initial metabolites Infer: l What set of products can be synthesized by the small- molecule metabolism of the organism Can known growth medium yield known essential compounds? Pacific Symposium on Biocomputing p471 2001
28
SRI International Bioinformatics Algorithm: Forward Propagation Nutrient set Metabolite set “Fire” reactions Transport Products Reactants PGDB reaction pool
29
SRI International Bioinformatics Results Phase I: Forward propagation l 21 initial compounds yielded only half of 38 essential compounds for E. coli Phase II: Manually identify l Bugs in EcoCyc (e.g., two objects for tryptophan) l Missing initial protein substrates (e.g., ACP) l Missing pathways in EcoCyc Phase III: Forward propagation with 11 more initial metabolites l Yielded all 38 essential compounds
30
SRI International Bioinformatics Initial Metabolites (Total: 21 compounds)
31
SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase I: Essential compounds produced 19 not produced19 Total compounds produced: (28%) Reactions Fired (31%)
32
SRI International Bioinformatics Missing Essential Compounds Due To Bugs in EcoCyc Narrow conceptualization of the problem l Protein substrates Incomplete biochemical knowledge
33
SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase II (After adding 11 extra metabolites): Essential compounds produced 38 not produced0 Total compounds produced: (49%) not produced: (51%) Reactions Fired (58%) Not fired (42%)
34
SRI International Bioinformatics Operon Prediction Based on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002) l Distance between genes l Functional classification l Correctly predicts 75% of transcription units, 65% of operons Additional information available in PGDB l Pathways l Protein complexes l Transporters l Improved prediction performance: 80% of transcription units, 69% of operons Detailed paper in preparation
35
SRI International Bioinformatics Visualization of Genetic Network Operon display window Transcription factor display window Highlight regulon on Overview diagram Paint expression data onto Overview diagram l Database adapter mechanism: MAGE-ML intermediate form u Adapter defined for SMD l Animation l User specified mapping of color ranges l Import of SAM files (next release) u List of significantly +/- genes Display full genetic network (later release)
36
SRI International Bioinformatics Acknowledgements SRI l Peter Karp, Suzanne Paley, Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud EcoCyc Project l Julio Collado-Vides, Ian Paulsen, Monica Riley, Milton Saier MetaCyc Project l Sue Rhee, Lukas Mueller, Peifen Zhang, Chris Somerville Stanford l Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh Funding sources: l NIH National Center for Research Resources l NIH National Institute of General Medical Sciences l NIH National Human Genome Research Institute l Department of Energy Microbial Cell Project l DARPA BioSpice, UPC BioCyc.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.