Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International.

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
SRI International Bioinformatics Comparative Analysis Q
Overview of the Pathway Tools Software and Pathway/Genome Databases.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Gene Ontology John Pinney
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Overview of Genome Databases Peter D. Karp, Ph.D. SRI International www-db.stanford.edu/dbseminar/seminar.html.
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
The EcoCyc and MetaCyc Pathway/Genome Databases
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Overview of the Pathway Tools Software and Pathway/Genome Databases.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Pathway Tools User Group Meeting Introduction Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
PathoLogic Pathway Predictor. SRI International Bioinformatics Inference of Metabolic Pathways Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs.
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
1 SRI International Bioinformatics The Pathway Tools Software and BioCyc Database Collection Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International.
1 SRI International Bioinformatics EcoCyc, MetaCyc, and the Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
Data Content of the BioCyc Databases. BioCyc Tier 1 Databases.
The Pathway Tools Ontology and Inferencing Layer Peter D. Karp, Ph.D. SRI International.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
SRI International Bioinformatics 1 Regulation in Pathway Tools Pathway Tools Workshop August 2009.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International
Writing Programs that Analyze Pathway/Genome Databases Markus Krummenacker Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
SRI International Bioinformatics Update your computers! To install a patch: Tools => Instant Patch => Download and Activate All Patches.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
Lecture 4.31 Protein Pathways and Pathway Databases Shan Sundararaj University of Alberta Edmonton, AB
Editing Pathway/Genome Databases
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
An Advanced Web Query Interface for Biological Databases
The Pathway Tools FBA Module
The Pathway Tools Schema
Bioinformatics Capstone Project
The Pathway Tools Software and BioCyc Database Collection
A Community Effort to Model the Human Microbiome
Comparative Analysis Q
Overview of Microbial Pathway and Genome Databases
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Overview of the Pathway Tools FBA Module
SRI Bioinformatics Research Group
Overview of the Pathway Tools Software and Pathway/Genome Databases
Presentation transcript:

Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International

SRI International Bioinformatics Motivation: Theories of Cellular Function Too Large for One Mind to Grasp Example: E. coli metabolic network l 160 pathways involving 744 reactions and 791 substrates Example: E. coli genetic network l Control by 97 transcription factors of 1174 genes in 630 transcription units Past solutions: l Partition theories across multiple minds l Encode theories in natural-language text We cannot compute with theories in those forms l Evaluate theories for consistency with new data: microarrays l Refine theories with respect to new data l Compare theories describing different organisms

SRI International Bioinformatics Solution: Biological Knowledge Bases Store biological knowledge and theories in computers in a declarative form l Amenable to computational analysis and generative user interfaces Establish ongoing efforts to curate (maintain, refine, embellish) these knowledge bases A high quality comprehensive knowledge base enables us to ask and answer important new questions

SRI International Bioinformatics Terminology Model Organism Database (MOD) – DB describing genome and other information about an organism Pathway/Genome Database (PGDB) – MOD that combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors, promoters, operons, DNA binding sites BioCyc – Collection of 15 PGDBs at BioCyc.org l EcoCyc, AgroCyc, HumanCyc

SRI International Bioinformatics Pathway Tools Software PathoLogic l Prediction of metabolic network from genome l Computational creation of new Pathway/Genome Databases Pathway/Genome Editors l Distributed curation of genome annotations l Distributed object database system l Interactive editing tools Pathway/Genome Navigator l WWW publishing of PGDBs l Graphic depictions of pathways, chromosomes, operons l Analysis operations u Pathway visualization of gene-expression data u Global comparisons of metabolic networks

SRI International Bioinformatics Pathway Tools Software Pathway/ Genome Databases Pathway/Genome Navigator PathoLogic Pathway Predictor Pathway/ Genome Editors

SRI International Bioinformatics Pathway/Genome Database Chromosomes, Plasmids Genes Proteins Reactions Pathways Compounds CELL Operons, Promoters, DNA Binding Sites

SRI International Bioinformatics Pathway Tools Algorithms Visualization and editing tools for following datatypes Full Metabolic Map l Paint gene expression data on metabolic network; compare metabolic networks Pathways l Pathway prediction Reactions l Balance checker Compounds l Chemical substructure comparison Enzymes, Transporters, Transcription Factors Genes Chromosomes Operons l Operon prediction; visualize genetic network

SRI International Bioinformatics Definitions Chemical reactions interconvert chemical compounds An enzyme is a protein that accelerates chemical reactions A pathway is a linked set of reactions l Often regulated as a unit l A conceptual unit of cell’s biochemical machine A + B C + D A C E

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics

SRI International Bioinformatics Operations of the Metabolic Overview Find pathways, compounds Find reactions l By enzyme name, EC number, substrates, modulation l All with isozymes l All occurring in multiple pathways l By EC class, pathway class Find genes l By name, gene class l All regulated by transcriptional regulator protein

SRI International Bioinformatics Metabolic Overview Queries Species comparison l Highlight reactions that are u Shared/not-shared with u Any-one/All-of u A specified set of species Overlay expression data l Colors reflects expression level and are user-configurable l Can show single experiment or animated time series

SRI International Bioinformatics EcoCyc Project E. coli Encyclopedia l Model-Organism Database for E. coli l Began in 1992 as collaboration between Karp and Riley l Over 3500 literature citations Collaborative development via Internet l Karp (SRI) -- Bioinformatics architect l John Ingraham -- Advisor l (SRI) Metabolic pathways l Saier (UCSD) and Paulsen (TIGR)-- Transport l Collado (UNAM)-- Regulation of gene expression Ontology: 1000 biological classes Database content: 17,700 instances

SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,393 Proteins: 4,273 Reactions: 2,760 Pathways: 165 Compounds: Transcription Units: 724 Factors: 110 Enzymes: 914 Transporters: 162 Promoters: 812 TransFac Sites: 956 Citations: 3,508

SRI International Bioinformatics MetaCyc: Metabolic Encyclopedia Nonredundant metabolic pathway database Describe a representative sample of every experimentally determined metabolic pathway Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates 460 pathways, 1267 enzymes, 4294 reactions l 172 E. coli pathways, 2735 citations Nucleic Acids Research 30: Jointly developed by SRI and Carnegie Institution l New focus on plant pathways

SRI International Bioinformatics MetaCyc Data MetaCyc contains one DB object for each distinct pathway l Distinct in terms of reaction steps l Each pathway labeled with species it occurs in MetaCyc pathways are experimentally determined 4218 reactions in MetaCyc l 401 lack EC numbers

SRI International Bioinformatics MetaCyc Enzyme Data Reaction(s) catalyzed Alternative substrates Cofactors / prosthetic groups Activators and inhibitors Subunit structure Molecular weight, pI Comment, literature citations Species

SRI International Bioinformatics MetaCyc Frequent Organisms Escherichia coli156 Arabidopsis thaliana47 Homo sapiens30 Pseudomonas21 Bacillus subtilis20 Salmonella typhimurium20 Sulfolobus solfataricus18 Pseudomonas putida14 Saccharomyces cerevisiae14 Haemophilus influenzae13 Glycine max11 Deinococcus radiourans10

SRI International Bioinformatics EcoCyc and MetaCyc Review level databases Data derived primarily from biomedical literature l Manual entry by staff curators l Updates by staff curators only Data validation l Consistency constraints l Lisp programs that verify other semantic relationships u Unbalanced chemical reactions

SRI International Bioinformatics Computationally-Derived PGDBs Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs Gene Products DNA Sequences Reactions Pathways Compounds Multi-organism Pathway Database (MetaCyc) PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Genomic Map Genes Gene Products Reactions Pathways Compounds

SRI International Bioinformatics PathoLogic Input/Output Inputs: l File listing genetic elements u l Files containing DNA sequence for each genetic element l Files containing annotation for each genetic element l MetaCyc database Output: l Pathway/genome database for the subject organism l Directory tree for the subject organism l Reports that summarize: u Evidence contained in the input genome for the presence of reference pathways u Reactions missing from inferred pathways

SRI International Bioinformatics PathoLogic Functionality Initialize schema for new PGDB Transform existing genome to PGDB form Infer metabolic pathways and store in PGDB Infer operons and store in PGDB Assist user with manual tasks l Assign enzymes to reactions they catalyze l Identify false-positive pathway predictions l Build protein complexes from monomers l Assemble Overview diagram

SRI International Bioinformatics BioCyc Collection of Pathway/Genome DBs Literature-based Datasets: Escherichia coli (EcoCyc) MetaCyc PGDBs at other sites: Arabidopsis thaliana (TAIR) Methanococcus jannaschii (EBI) Saccharomyces cerevisiae (SGD) Synechocystis PCC6803 Computationally-derived datasets: Agrobacterium tumefaciens Caulobacter crescentus Chlamydia trachomatis Bacillus subtilis Helicobacter pylori Haemophilus influenzae Homo sapiens Mycobacterium tuberculosis RvH37 Mycobacterium tuberculosis CDC1551 Mycoplasma pneumonia Pseudomonas aeruginosa Treponema pallidum Vibrio cholerae Yellow = Open Database

SRI International Bioinformatics HumanCyc: Human Metabolic Pathway Database PGDB of human metabolic pathways built using PathoLogic Contains information on 28,700 genes, their products, and the metabolic reactions and pathways they catalyze (no signalling pathways) Chromosome and contigs from Ensembl Human genetic loci from LocusLink Mitochondrion data from GenBank Ensembl and LocusLink gene entries were merged to eliminate redundancies where possible. Contains links to human genome web sites Plan to hire one curator to refine and curate with respect to literature over a 2 year period l Remove false-positive predictions l Insert known pathways missed by PathoLogic l Add comments and citations from pathways and enzymes to the literature l Add enzyme activators, inhibitors, cofactors, tissue information Funded by commercial consortium

SRI International Bioinformatics BioCyc and Pathway Tools Availability WWW BioCyc freely available to all l BioCyc.org l Six BioCyc DBs openly available to all BioCyc DBs freely available to non-profits l Flatfiles downloadable from BioCyc.org l Binary executable: u Sun UltraSparc-170 w/ 64MB memory u PC, 400MHz CPU, 64MB memory, Windows-98 or newer l PerlCyc API Pathway Tools freely available to non-profits

SRI International Bioinformatics Information Sources Pathway Tools User’s Guide l aic-export/ecocyc/genopath/released/doc/userguide1.pdf u Pathway/Genome Navigator u Appendix A: Guide to the Pathway Tools Schema l aic-export/ecocyc/genopath/released/doc/userguide2.pdf u PathoLogic, Editing Tools Pathway Tools Web Site l l Publications, programming examples, etc. Pathway Tools Tutorial l

SRI International Bioinformatics Pathway Tools Implementation Details Allegro Common Lisp Sun and PC platforms Ocelot object database 250,000 lines of code Lisp-based WWW server at BioCyc.org l Manages 15 PGDBs

SRI International Bioinformatics Frame Data Model Frame Data Model -- organizational structure for a PGDB Knowledge base (KB, Database, DB) Frames Slots

SRI International Bioinformatics Knowledge Base Collection of frames and their associated slots, values, facets, and annotations AKA: Database, PGDB Can be stored within l An Oracle DB l A disk file l A Pathway Tools binary program

SRI International Bioinformatics Frames Entities with which facts are associated Kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle Classes: l Superclass(es) l Subclass(es) l Instance(s) A symbolic frame name (id, key) uniquely identifies each frame

SRI International Bioinformatics Slots Encode attributes/properties of a frame l Integer, real number, string Represent relationships between frames l The value of a slot is the identifier of another frame Every slot is described by a “slot frame” in a KB that defines meta information about that slot

SRI International Bioinformatics Properties of Slots Number of values l Single valued l Multivalued: sets, bags Slot values l Any LISP object: Integer, real, string, symbol (frame name) Slotunits define properties of slots: datatypes, classes, constraints Two slots are inverses if they encode opposite relationships l Slot Product in class Genes l Slot Gene in class Polypeptides

SRI International Bioinformatics Pathway Tools Ontology 1064 classes l Main classes such as: u Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) l Taxonomies for Pathways, Reactions, Compounds 205 slots l Meta-data: Creator, Creation-Date l Comment, Citations, Common-Name, Synonyms l Attributes: Molecular-Weight, DNA-Footprint-Size l Relationships: Catalyzes, Component-Of, Product Classes, instances, slots all stored side by side in DBMS, share a single namespace

SRI International Bioinformatics Slot Links from Gene to Pathway Frame Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway Chrom succinate FAD fumarate FADH 2 left right

SRI International Bioinformatics Enzymatic-reaction frame stores properties of pairing between enzyme and reaction Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle EC# K eq Cofactors Inhibitors Molecular wt pI Left-end-position

SRI International Bioinformatics Monofunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway

SRI International Bioinformatics Bifunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway Reaction Enzymatic-reaction

SRI International Bioinformatics Monofunctional Multimer Monomer Gene Reaction Enzymatic-reaction Multimer Pathway

SRI International Bioinformatics Pathway and Substrates Reactant-1 Reaction Pathway Reaction Reactant-2 Product-2 Product-1 in-pathway left right

SRI International Bioinformatics Genetic Network Representation Describe biological entities involved in control of transcription initiation l Promoters, operators, transcription factors, operons, terminators Describe molecular interactions among these entities l Modulation of transcription factor activity l Binding of transcription factors to DNA binding sites l Effects on transcription initiation

SRI International Bioinformatics Ontology for Transcriptional Regulation One DB object defined for each biological entity and for each molecular interaction site001 pro001 trpE trpD trpC trpB trpA trpL Int002RpoSig70 TrpR*trpInt001 trpLEDCBA trp apoTrpR Complexation reaction Int001 (binding of TrpR*trp to site001) inhibits Int002 (binding of RNA Polymerase to promoter) and consequently prevents transcription of genes in transcription unit.

SRI International Bioinformatics Principle Classes Class names are capitalized, plural Genetic-Elements, with subclasses: l Chromosomes l Plasmids Genes Transcription-Units RNAs Proteins, with subclasses: l Polypeptides l Protein-Complexes

SRI International Bioinformatics Principle Classes Reactions, with subclasses: l Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements

SRI International Bioinformatics Slots in Multiple Classes Common-Name Synonyms Names (computed as union of Common-Name, Synonyms) Comment Citations DB-Links

SRI International Bioinformatics Genes Slots Chromosome Left-End-Position Right-End-Position Centisome-Position Transcription-Direction Product

SRI International Bioinformatics Proteins Slots Molecular-Weight-Seq Molecular-Weight-Exp pI Locations Modified-Form Unmodified-Form Component-Of

SRI International Bioinformatics Polypeptides Slots Gene

SRI International Bioinformatics Protein-Complexes Slots Components

SRI International Bioinformatics Reactions Slots EC-Number Left, Right Substrates (computed as union of Left, Right) Enzymatic-Reaction DeltaG0 Spontaneous?

SRI International Bioinformatics Enzymatic-Reactions Slots Enzyme Reaction Activators Inhibitors Physiologically-Relevant Cofactors Prosthetic-Groups Alternative-Substrates Alternative-Cofactors Reaction-direction

SRI International Bioinformatics Pathways Slots Reaction-List Predecessors Primaries