Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International.

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 Web Services. SRI International Bioinformatics 2 Kinds of Web Services Data retrieval Web Services l PTools-XML l BioPAX.
SRI International Bioinformatics Comparative Analysis Q
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of a different aspect of the cellular.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
The EcoCyc and MetaCyc Pathway/Genome Databases
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Pathway databases Goto S, Bono H, Ogata H, Fujibuchi W, Nishioka T, Sato K, Kanehisa M. (1997) Organizing and computing metabolic pathway data in terms.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International.
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
1 SRI International Bioinformatics The Pathway Tools Software and BioCyc Database Collection Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Overviews, Omics Viewers, and Object Groups. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery l Cellular Overview.
The Pathway Tools Ontology and Inferencing Layer Peter D. Karp, Ph.D. SRI International.
TAIR/Gramene/SGN Workshop I ASPB Meeting July 08, 2007 Chicago, IL Metabolic Databases.
SRI International Bioinformatics 1 The PerlCyc and JavaCyc APIs.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
1 SRI International Bioinformatics And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Cellular Overview and Omics Viewer. SRI International Bioinformatics The Cellular Overview Diagram A way to quickly visualize an organism’s metabolism.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
© 2014 SRI International About OMICS Group OMICS Group International is an amalgamation of Open Access publications and worldwide international science.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
Introduction to biological molecular networks
Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International
Writing Programs that Analyze Pathway/Genome Databases Markus Krummenacker Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
PythonCyc and other APIs A Python package to access Pathway Tools and its data using the Python programming language Mario Latendresse March 2016.
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
The Pathway Tools FBA Module
The Pathway Tools Schema
Building Metabolic Models
The Pathway Tools Software and BioCyc Database Collection
A Community Effort to Model the Human Microbiome
Reachability Analysis Bioinformatics Research Group
Comparative Analysis Q
Overview of Microbial Pathway and Genome Databases
The MultiOmics Explainer
Overview of the Pathway Tools FBA Module
SRI Bioinformatics Research Group
Reachability Analysis
Presentation transcript:

Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International

SRI International Bioinformatics Computing with Pathway Tools: APIs Generic functions with a consistent naming scheme l Basic frame access functions l Built-in functions for analysis and global statistics Simultaneous access to multiple KBs l Cross-species comparisons l Specialized KBs u MetaCyc u SchemaBase

SRI International Bioinformatics Computing with Pathway Tools: APIs PerlCyc interface l Library of Perl functions for querying PGDBs via socket connection l Database access functions u Select_Organism, All_Pathways l Functions for performing inference / hardwired queries u Genes_Of_Reaction, Genes_Of_Pathway u Transcription_Unit_Transcription_Factors u Enzyme_P JavaCyc interface also in progress Lisp API l

SRI International Bioinformatics Perlcyc and Javacyc Interface to running Pathway Tools image through TCP Names are translated to Perl and Java conventions Object references are supported by means of unique frame names

SRI International Bioinformatics Pathway Tools API Functions get_class_all_instances(Class) l Returns the instances of Class Key Pathway Tools classes: l Genetic-Elements l Genes l Proteins u Polypeptides u Protein-Complexes l Pathways l Reactions l Compounds-And-Elements l Enzymatic-Reactions l Transcription-Units l Promoters l DNA-Binding-Sites

SRI International Bioinformatics Pathway Tools API Functions Notation Frame.Slot means a specified slot of a specified frame get_slot_value(Frame Slot) l Returns first value of Frame.Slot get_slot_values(Frame Slot) l Returns all values of Frame.Slot slot_has_value_p(Frame Slot) l Returns true if Frame.Slot has at least one value member_slot_value_p(Frame Slot Value) l Returns true if Value is one of the values of Frame.Slot

SRI International Bioinformatics Additional Pathway Tools Functions – Semantic Inference Layer Built-in functions encode commonly used queries that compute indirect DB relationships l genes_of_pathway, substrates_of_pathway l all_transcription_factors, regulon_of_protein See fns.html for more informationhttp://bioinformatics.ai.sri.com/ptools/ptools- fns.html

SRI International Bioinformatics Computing with Pathway Tools: Flat Files Two file formats: tab-delimited, attribute-value One file for each format, each datatype Specification: l Examples: l Pathways.col – Pathways and genes encoding enzymes l Enzymes.col – Enzymes and reactions they catalyze l Pathways.dat – Full data on each pathway l Reactions.dat – Full data on each reaction

SRI International Bioinformatics Example Flat File UNIQUE-ID - P107-PWY TYPES - Energy-Metabolism COMMON-NAME - RuMP cycle and formaldehyde assimilation REACTION-LIST - FORMATEDEHYDROG-RXN REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN REACTION-LIST - 6PGLUCONDEHYDROG-RXN REACTION-LIST - R84-RXN REACTION-LIST - PGLUCISOM-RXN REACTION-LIST - R12-RXN REACTION-LIST - R10-RXN SYNONYMS - ribulose-monophosphate cycle SYNONYMS - formaldehyde oxidation //

SRI International Bioinformatics Example Flat File – Reactions.dat UNIQUE-ID - R84-RXN TYPES - EC EC-NUMBER IN-PATHWAY - P122-PWY IN-PATHWAY - P107-PWY LEFT - GLC-6-P LEFT - NAD OFFICIAL-EC? - NO RIGHT - 6-P-GLUCONATE RIGHT - NADH RIGHT - PROTON //

SRI International Bioinformatics Example Flat File – Compounds.dat UNIQUE-ID - GLC-6-P TYPES - Carbohydrate-Derivatives COMMON-NAME - glucose-6-phosphate CAS-REGISTRY-NUMBERS CHEMICAL-FORMULA - (C 6) CHEMICAL-FORMULA - (H 13) CHEMICAL-FORMULA - (O 9) CHEMICAL-FORMULA - (P 1) MOLECULAR-WEIGHT SYNONYMS - D-glucose-6-P SYNONYMS - glucose-6-P SYNONYMS - α-D-glucose-6-phosphate SYNONYMS - α-D-glucose-6-P SYNONYMS - D-glucose-6-phosphate //

SRI International Bioinformatics Bioinformatics Results: Algorithms Query and visualization environment for genome and pathway information PathoLogic algorithm predicts the metabolic network of an organism from its genome Algorithm for global characterization of a metabolic network Algorithms under development for qualitative modeling of the cell

SRI International Bioinformatics The Pathway Tools KB as a "virtual cell" Detailed representation of proteins, including subunits Protein complexes and modifications Links from genome, through proteins, to pathways and superpathways

SRI International Bioinformatics Computing with the Metabolic Network Comparative analysis of metabolic networks Visualization of expression data Correlation of metabolism and transport Connectivity analysis of metabolic network Forward propagation of metabolites Verification of known growth media with metabolic network

SRI International Bioinformatics Computational Exploration of PGDBs Infer metabolic network from genome l Bioinformatics 18: Global properties of the metabolic network l Genome Research 10: Global properties of the genetic network Comparison of whole metabolic networks Consistency of a PGDB with respect to known growth-media requirements Search for gaps in metabolic network l Pacific Symp Biocomputing 2001:471

SRI International Bioinformatics Example Studies Relationship of protein subunits to gene positions Global properties of the E. coli metabolic network l Reactions catalyzed by more than one enzyme l Enzymes that catalyze more than one reaction l Reactions participating in more than one pathway u Automatic detection of intersection points in the metabolic network Nutrient analyses l Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network? l Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds? Operon prediction

SRI International Bioinformatics Protein subunits and linked genes Question: are protein subunits coded by neighboring genes? l Proteins are linked to genes, gene positions are recorded in the KB Procedure l Fetch all protein complexes l Subunits are stored in the ‘components’ slot l Each component has a ‘gene’ slot l Genes have ‘left-end-position’ and ‘right-end-position’ slots Results l Protein subunits of >90% of heteromeric enzymes are encoded by neighboring genes

SRI International Bioinformatics Global properties: How many reactions are catalyzed by more than one enzyme? Procedure l get_class_all_instances(‘Reactions’) l We are interested only in reactions with at least one value in their ‘enzymatic-reaction’ slot l result = reactions with more than one value for their ‘enzymatic-reaction’ slot Results l About 10% of reactions are catalyzed by more than one enzyme l Two classes of multi-enzyme reactions u Homologous enzymes u “Easy” reactions

SRI International Bioinformatics Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?) Procedure l get_class_all_instances(‘Proteins’) l result = proteins with more than one value in the ‘catalyzes’ slot Results l 100 out of 607 enzymes catalyze multiple reactions l This is significantly more than predicted by genome sequencing projects

SRI International Bioinformatics Global properties: Reactions in multiple pathways Procedure l get_class_all_instances(‘Reactions’) l result = reactions with more than one value in the ‘in- pathway’ slot Significance l Reactions that appear in multiple pathways correspond to intersection points in the metabolic network u Could be used to identify candidate reactions for drug targets

SRI International Bioinformatics Metabolic Overview Queries Species comparison l Highlight reactions that are u Shared/not-shared with u Any-one/All-of u A specified set of species Overlay expression data l Absolute or relative expression levels l Reaction colors reflects expression level

SRI International Bioinformatics A E

SRI International Bioinformatics

SRI International Bioinformatics C. crescentus Cell Cycle Gene Expression

SRI International Bioinformatics Global Consistency Checking of Biochemical Network Given: l A PGDB for an organism l A set of initial metabolites Infer: l What set of products can be synthesized by the small- molecule metabolism of the organism Can known growth medium yield known essential compounds? Pacific Symposium on Biocomputing p

SRI International Bioinformatics Algorithm: Forward Propagation Nutrient set Metabolite set “Fire” reactions Transport Products Reactants PGDB reaction pool

SRI International Bioinformatics Results Phase I: Forward propagation l 21 initial compounds yielded only half of 38 essential compounds for E. coli Phase II: Manually identify l Bugs in EcoCyc (e.g., two objects for tryptophan) l Missing initial protein substrates (e.g., ACP) l Missing pathways in EcoCyc Phase III: Forward propagation with 11 more initial metabolites l Yielded all 38 essential compounds

SRI International Bioinformatics Initial Metabolites (Total: 21 compounds)

SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase I: Essential compounds produced 19 not produced19 Total compounds produced: (28%) Reactions Fired (31%)

SRI International Bioinformatics Missing Essential Compounds Due To Bugs in EcoCyc Narrow conceptualization of the problem l Protein substrates Incomplete biochemical knowledge

SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase II (After adding 11 extra metabolites): Essential compounds produced 38 not produced0 Total compounds produced: (49%) not produced: (51%) Reactions Fired (58%) Not fired (42%)

SRI International Bioinformatics Operon Prediction Based on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002) l Distance between genes l Functional classification l Correctly predicts 75% of transcription units, 65% of operons Additional information available in PGDB l Pathways l Protein complexes l Transporters l Improved prediction performance: 80% of transcription units, 69% of operons Detailed paper in preparation

SRI International Bioinformatics Visualization of Genetic Network Operon display window Transcription factor display window Highlight regulon on Overview diagram Paint expression data onto Overview diagram l Database adapter mechanism: MAGE-ML intermediate form u Adapter defined for SMD l Animation l User specified mapping of color ranges l Import of SAM files (next release) u List of significantly +/- genes Display full genetic network (later release)

SRI International Bioinformatics Acknowledgements SRI l Peter Karp, Suzanne Paley, Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud EcoCyc Project l Julio Collado-Vides, Ian Paulsen, Monica Riley, Milton Saier MetaCyc Project l Sue Rhee, Lukas Mueller, Peifen Zhang, Chris Somerville Stanford l Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh Funding sources: l NIH National Center for Research Resources l NIH National Institute of General Medical Sciences l NIH National Human Genome Research Institute l Department of Energy Microbial Cell Project l DARPA BioSpice, UPC BioCyc.org