1 SRI International Bioinformatics EcoCyc, MetaCyc, and the Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics Comparative Analysis Q
Advertisements

Overview of the Pathway Tools Software and Pathway/Genome Databases.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of a different aspect of the cellular.
Overview of the Pathway Tools Software and Pathway/Genome Databases.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
The EcoCyc and MetaCyc Pathway/Genome Databases
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Overview of the Pathway Tools Software and Pathway/Genome Databases.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
Pathway Tools User Group Meeting Introduction Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
1 SRI International Bioinformatics The Pathway Tools Software and BioCyc Database Collection Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Overviews, Omics Viewers, and Object Groups. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery.
Data Content of the BioCyc Databases. BioCyc Tier 1 Databases.
Copyright OpenHelix. No use or reproduction without express written consent1.
TAIR/Gramene/SGN Workshop I ASPB Meeting July 08, 2007 Chicago, IL Metabolic Databases.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
SRI International Bioinformatics 1 Object Groups & Enrichment Analysis Suzanne Paley Pathway Tools Workshop 2010.
Copyright OpenHelix. No use or reproduction without express written consent1.
SRI International Bioinformatics 1 Recent Pathway Tools Performance Enhancements (Versions 13.0 to 14.5) Bioinformatics Research Group SRI International.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
© 2014 SRI International About OMICS Group OMICS Group International is an amalgamation of Open Access publications and worldwide international science.
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008.
Introduction to biological molecular networks
The Pathway/Genome Navigator. SRI International Bioinformatics Overview Data page types General query strategies Web queries Desktop Pathway Tools User.
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
The Pathway/Genome Navigator
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
An Advanced Web Query Interface for Biological Databases
The Pathway Tools FBA Module
The Pathway Tools Schema
The Pathway Tools Software and BioCyc Database Collection
A Community Effort to Model the Human Microbiome
Comparative Analysis Q
Overview of Microbial Pathway and Genome Databases
Bioinformatics Research Group SRI International
Annotation Presentation
Overview of the Pathway Tools FBA Module
SRI Bioinformatics Research Group
Overview of the Pathway Tools Software and Pathway/Genome Databases
Presentation transcript:

1 SRI International Bioinformatics EcoCyc, MetaCyc, and the Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org, MetaCyc.org

2 SRI International Bioinformatics MetaCyc Family of Pathway/Genome Databases 1,700+ databases from multiple institutions Cover all domains of life with microbial emphasis All DBs derived from MetaCyc via computational pathway prediction Common schema Common controlled vocabularies Common methodologies Archives of Toxicology 2011

3 SRI International Bioinformatics Curated Databases Within the MetaCyc Family DatabaseOrganismOrganizationCurated From MetaCycMultiorganismSRI26,000 EcoCycE. coliSRI21,000 HumanCycH. sapiensSRI AraCycA. thalianaCarnegie Instit.2,282 YeastCycS. cerevisiaeStanford Univ565 MouseCycM. musculusJackson Labs

4 SRI International Bioinformatics BioCyc Collection of 1,100 Pathway/Genome Databases Pathway/Genome Database (PGDB) – combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors/sites, promoters, operons Tier 1: Literature-Derived PGDBs l MetaCyc l EcoCyc -- Escherichia coli K-12 Tier 2: Computationally-derived DBs, Some Curation PGDBs l HumanCyc, BsubCyc l Mycobacterium tuberculosis Tier 3: Computationally-derived DBs, No Curation -- The remainder

5 SRI International Bioinformatics EcoCyc Project – EcoCyc.org E. coli Encyclopedia l Review-level Model-Organism Database for E. coli l Tracks evolving annotation of the E. coli genome and cellular networks l The two paradigms of EcoCyc “Multi-dimensional annotation of the E. coli K-12 genome” l Positions of genes; functions of gene products – 76% / 66% exp l Gene Ontology terms; MultiFun terms l Gene product summaries and literature citations l Evidence codes l Multimeric complexes l Metabolic pathways l Regulation of gene expression and of protein activity Nuc. Acids Res. 35: ASM News 70: Science 293:2040 Karp, Gunsalus, Collado-Vides, Paulsen

6 SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,489 Proteins: 4,479 Complexes: 895 RNAs: 285 Reactions: Metabolic: 1446 Transport: 287 Pathways: 260 Compounds: 1,830 URL: EcoCyc.org Regulation: Operons: 3,409 Trans Factors: 206 Promoters: 1,878 TF Binding Sites: 2,394 Reg Interactions: 5345 EcoCyc v15.0 Citations: 21,000

7 SRI International Bioinformatics EcoCyc on the iPhone

8 SRI International Bioinformatics EcoCyc on the iPhone

9 SRI International Bioinformatics PortEco.org EcoCyc + PortEco = E. coli model-organism database Query multiple E. coli databases simultaneously E. coli gene expression archive E. coli Wiki ~40 E. coli and Shigella databases available at BioCyc.org

10 SRI International Bioinformatics MetaCyc : Metabolic Encyclopedia Describe a representative sample of every experimentally determined metabolic pathway Describe properties of metabolic enzymes Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates MetaCyc vs BioCyc: Experimentally elucidated pathways Jointly developed by l P. Karp, R. Caspi, C. Fulcher, SRI International l L. Mueller, A. Pujar, Boyce Thompson Institute l S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research 2010

11 SRI International Bioinformatics Applications of MetaCyc Reference source on metabolic pathways and enzymes Predict pathways from genomes Metabolic engineering l Find desired metabolic pathways and reactions l Find enzymes with desired activities, regulatory properties l Determine cofactor requirements

12 SRI International Bioinformatics MetaCyc Data -- Version 15.4 Pathways 1,747 Reactions 9,460 Enzymes 7,424 Small Molecules 9,188 Organisms2,170 Citations 29,900

13 SRI International Bioinformatics Comparison with KEGG KEGG vs MetaCyc: Reference pathway collections l KEGG maps are not pathways Nuc Acids Res 34: u KEGG maps contain multiple biological pathways u KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms u Two genes chosen at random from a BioCyc pathway are more likely to be related according to genome context methods than from a KEGG pathway l KEGG has no literature citations, no comments, less enzyme detail l KEGG assigns half as many reactions to pathways as MetaCyc KEGG vs organism-specific PGDBs l KEGG does not curate or customize pathway networks for each organism l Highly curated PGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis

14 SRI International Bioinformatics Comparison of Pathway Tools to KEGG Inference tools l KEGG does not predict presence or absence of pathways l KEGG lacks pathway hole filler, operon predictor Curation tools l KEGG does not distribute curation tools l No ability to customize pathways to the organism l Pathway Tools schema much more comprehensive Visualization and analysis l KEGG does not perform automatic pathway layout l KEGG metabolic-map diagram extremely limited l No comparative pathway analysis

15 SRI International Bioinformatics EcoCyc and MetaCyc Review level databases Data derived primarily from biomedical literature l Manual entry by staff curators l Updates by staff curators only DBMS: Frame knowledge representation system Data validation l Consistency constraints l Lisp programs that verify other semantic relationships u Unbalanced chemical reactions

16 SRI International Bioinformatics Pathway Tools Software

17 SRI International Bioinformatics Pathway Tools Software Pathway/Genome Editors Pathway/Genome Database PathoLogic Annotated Genome Pathway/Genome Navigator Briefings in Bioinformatics 11: Genome-Scale Flux Model

18 SRI International Bioinformatics Pathway Tools Software: PathoLogic Computational creation of new Pathway/Genome Databases Transforms genome into Pathway Tools schema and layers inferred information above the genome Predicts operons Predicts metabolic network Predicts which genes code for missing enzymes in metabolic pathways Infers transport reactions from transporter names

19 SRI International Bioinformatics Pathway Tools Software: Pathway/Genome Editors Interactively update PGDBs with graphical editors Support geographically distributed teams of curators with object database system Gene and protein editor Reaction editor Compound editor Pathway editor Operon editor Publication editor

20 SRI International Bioinformatics Pathway Tools Software: Pathway/Genome Navigator Querying and visualization of: l Pathways l Reactions l Metabolites l Genes/Proteins/RNA l Regulatory interactions l Chromosomes Two modes of operation: l Web mode l Desktop mode l Most functionality shared, but each has unique functionality

21 SRI International Bioinformatics Pathway Tools Implementation Details Platforms: l Macintosh, PC/Linux, and PC/Windows platforms Same binary can run as desktop app or Web server Production-quality software l Version control l Two regular releases per year l Extensive quality assurance l Extensive documentation l Auto-patch l Automatic DB-upgrade 480,000 lines of Lisp code

22 SRI International Bioinformatics Why Do We Code in Common Lisp? Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11: ) l The average Lisp program ran 33 times faster than the average Java program l The average Lisp program was written 5 times faster than the average Java program Roberts compared Java and Lisp implementations of a Domain Name Server (DNS) resolver l l The Lisp version had ½ as many lines of code

23 SRI International Bioinformatics Cellular Overview Diagram Combines metabolic map and transporters Automatically generated for each organism Zoomable, queryable Web-based and desktop BioCyc.org l Tools  Cellular Overview l Tools  Regulatory Overview l Fastest with Safari, Chrome, Firefox

24 SRI International Bioinformatics

25 SRI International Bioinformatics

26 SRI International Bioinformatics

27 SRI International Bioinformatics Omics Data Graphing on Cellular Overview

28 SRI International Bioinformatics

29 SRI International Bioinformatics

30 SRI International Bioinformatics Genome Overview

31 SRI International Bioinformatics Genome Poster

32 SRI International Bioinformatics Regulatory Overview and Omics Viewer Show regulatory relationships among gene groups

33 SRI International Bioinformatics Genome Browser ChIP-Chip Data Shown in Graph Track

34 SRI International Bioinformatics Enrichment Analysis “My experiments yielded a set of genes/metabolites. What do they have in common?” Given a set of genes: l What GO terms are statistically over-represented in that set? l What metabolic pathways are over-represented? l What transcriptional regulators are over-represented? Given a set of metabolites: l What metabolic pathways are statistically over-represented in that set?

35 SRI International Bioinformatics Automated Generation of Metabolic Flux Models from PGDBs Joint work with Mario Latendresse

36 SRI International Bioinformatics Goals Decrease the time required to construct FBA models from 9-12 months to several weeks Create richer FBA models that are tightly coupled to genome and regulatory information Make FBA models and results more transparent

37 SRI International Bioinformatics Approach: Derive FBA Models from PGDBs Store and update metabolic model within Pathway Tools Export to constraint solver for model execution/solving Fast generation of metabolic model from annotated genome Pathway Tools schema l Associate a wealth of information with each metabolic model l Unique identifiers and controlled vocabulary for model components Tools for querying and visualization of metabolic models Tools for model debugging and analysis l Reaction balance checking l Dead-end metabolite analysis l Visualize reaction flux using cellular overview l Multiple gap filling

38 SRI International Bioinformatics FBA Generation Module: Inputs Nutrients Biomass Secretions A ABC X DD Reaction List

39 SRI International Bioinformatics FBA Formulation as Linear Program Boundary reactions: l Exchange fluxes for nutrients and secretions l Biomass reaction L-arginine … + GTP … + …  biomass For each internal metabolite M l R1: A + M  B l R2: C + M  D l R3: E + M  F + G l R4: X + Y  M l R5: W  M + Z Consuming fluxes balance producing fluxes l R1 + R2 + R3 = R4 + R5

40 SRI International Bioinformatics FBA Model Execution Runs SCIP solver on.lp file l Konrad-Zuse-Zentrum für Informationstechnik Berlin Interpret SCIP output l Determine if SCIP found a solution l Map fluxes to PGDB reactions Display resulting fluxes on the Cellular Overview

41 SRI International Bioinformatics Model Debugging via Multiple Gap Filling Most FBA models are not initially solvable because of incomplete or incorrect information Use meta-optimization to postulate alterations to a model to render it solvable Each alteration has an associated cost; minimize cost of alterations Formulate as MILP and submit to SCIP

42 SRI International Bioinformatics Multiple Gap Filling of FBA Models Reaction gap filling (Kumar et al, BMC Bioinf :212) : l Reverse directionality of selected reactions l Add a minimal number of reactions from MetaCyc to the model to enable a solution l Reaction cost is a function of reaction taxonomic range Metabolite gap filling: Postulate additional nutrients and secretions Partial solutions: Identify maximal subset of biomass components for which model can yield positive production rates

43 SRI International Bioinformatics MILP Objective Function for Gap Filling Σ w b B i + Σ w r R a + Σ w t R b + Σ w m R c + Σ w s S k + Σ w n N p Where W b > 0, w r, wt, w m, w s, w n < 0 are weights for biomass, reactions (2), secretions, and nutrients B i, R a, R b, R c, S k, N p are binary variables iab ckp

44 SRI International Bioinformatics Results – FBA Model of Human Metabolism 46 biomass compounds 13 nutrients 2secretions 207reactions carry non-zero flux

45 SRI International Bioinformatics Gap Filler Suggestions Addition of 8 new reactions from MetaCyc; 4 supported by literature research Reversal of 4 reactions confirmed by literature searches Enzyme curated into wrong compartment FBA analysis identified an amino-acid biosynthetic pathway that should not have been present in HumanCyc Further issues identified by dead-end metabolite analysis and reachability analysis

46 SRI International Bioinformatics

47 SRI International Bioinformatics Comparative Analysis Via Cellular Overview Comparative genome browser Comparative pathway table Comparative analysis reports l Compare reaction complements l Compare pathway complements l Compare transporter complements

48 SRI International Bioinformatics Advanced Query Form Intuitive construction of complex database queries of SQL power

49 SRI International Bioinformatics Work in Progress Computation of reaction atom mappings Program to generate metabolic pathways that synthesize target compound from feedstock compound

50 SRI International Bioinformatics How to Learn More BioCyc.org Help menu BioCyc Webinars l Biocyc.org/webinar.shtml Publications page l Biocyc.org/publications.shtml Tutorials held at SRI l Next week: FBA