Download presentation
Presentation is loading. Please wait.
Published byJanis Terry Modified over 9 years ago
1
1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org, MetaCyc.org, HumanCyc.org
2
2 SRI International Bioinformatics Overview Pathway/Genome Databases (PGDBs) l BioCyc collection l EcoCyc, MetaCyc Pathway Tools Software & Applications l Visualization, Editing, Analysis, Omics data l Inference tools: PathoLogic, Operon predictor, Pathway hole filler l Tools for debugging a predicted metabolic network Some Ontology Details l Pathways, Reactions and Compounds, Enzymes, Genes l Regulation l Integration with other efforts: BioPAX, GO, NCBI Taxonomy
3
3 SRI International Bioinformatics Model Organism Databases / PGDBs DBs that describe the genome and molecular machinery of one specific organism. l Integrating many diverse types of data into a coherent model of a cell Every sequenced organism with an active experimental community requires a MOD l Integrate genome data with information about the biochemical and genetic network of the organism l Integrate literature-based information with computational predictions l Ongoing updating of sequence, gene positions and functions, regulatory sites, pathways MODs are platforms for global analyses of the organism l Interpret omics data in a pathway context l In silico prediction of essential genes l Characterize systems properties of metabolic and genetic networks
4
4 SRI International Bioinformatics BioCyc Collection of Pathway/Genome Databases Pathway/Genome Database (PGDB) – combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors/sites, promoters, operons Tier 1: Literature-Derived PGDBs l MetaCyc l EcoCyc -- Escherichia coli K-12 Tier 2: Computationally-derived DBs, Some Curation -- 20 PGDBs l HumanCyc l Mycobacterium tuberculosis Tier 3: Computationally-derived DBs, No Curation -- 349 DBs
5
5 SRI International Bioinformatics Pathway Tools: PathoLogic Inference Pathway/Genome Editors Pathway/Genome Database PathoLogic Annotated Genome MetaCyc Reference Pathway DB Pathway/Genome Navigator
6
6 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI 1,300+ licensees: 75+ groups applying software to 200+ organisms Saccharomyces cerevisiae, SGD project, Stanford University Mouse, MGD, Jackson Laboratory dictyBase, Northwestern University Under development: l CGD ( Candida albicans ), Stanford University l Drosophila, P. Ebert in collaboration with FlyBase l C. elegans, P. Ebert in collaboration with WormBase Planned: l RGD (Rat), Medical College of Wisconsin Arabidopsis thaliana, TAIR, Carnegie Institution of Washington PlantCyc, ~20 plant PGDBs, Carnegie Institution of Washington Six Solanaceae species, Cornell University GrameneDB, Cold Spring Harbor Laboratory Medicago truncatula, Samuel Roberts Noble Foundation
7
7 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDB Gary Xie, Los Alamos Lab, Dental pathogens F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa V. Schachter, Genoscope, Acinetobacter M. Bibb, John Innes Centre, Streptomyces coelicolor G. Church, Harvard, Prochlorococcus marinus, multiple strains E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579 Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major Herbert Chiang, Washington University, Bacteroides thetaiotaomicron Sergio Encarnacion, UNAM, Sinorhizobium meliloti Gregory Fournier, MIT, Mesoplasma florum Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicum Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472
8
8 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI Large scale users: l C. Medigue, Genoscope, 150+ PGDBs l G. Burger, U Montreal, 60+ PGDBs l Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens, Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes Partial listing of outside PGDBs at BioCyc.org
9
9 SRI International Bioinformatics Pathway Evidence
10
10 SRI International Bioinformatics Pathway Tools Overviews and Omics Viewers Designed to avoid the hairball effect Generated automatically from PGDB Magnify, interrogate Omics viewers paint omics data onto overview diagrams l Different perspectives on same dataset l Use animation for multiple time points or conditions l Paint any data that associates numbers with genes, proteins, reactions, or metabolites Provide genome-scale visualizations of cellular networks Harness human visual system to interpret patterns in biological contexts
11
11 SRI International Bioinformatics Regulatory Overview and Omics Viewer Show regulatory relationships among gene groups
12
12 SRI International Bioinformatics
13
13 SRI International Bioinformatics
14
14 SRI International Bioinformatics Comparative Analysis Via Cellular Overview Comparative genome browser Comparative pathway table Comparative analysis reports l Compare reaction complements l Compare pathway complements l Compare transporter complements
15
15 SRI International Bioinformatics Pathway Tools Ontology 1621 Classes l Main classes such as: u Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) l Taxonomies for Pathways, Reactions (EC), Compounds l Cell Component Ontology l Protein Feature ontology 221 Slots for attributes and relationships l Meta-data: Creator, Creation-Date l Comment, Citations, Common-Name, Synonyms l Attributes: Molecular-Weight, DNA-Footprint-Size l Relationships: Catalyzes, Component-Of, Product Evidence codes, supporting citations
16
16 SRI International Bioinformatics Pathway/Genome Database Schema
17
17 SRI International Bioinformatics Protein Feature Ontology
18
18 SRI International Bioinformatics Advanced Query Form Intuitive construction of complex database queries of SQL power
19
19 SRI International Bioinformatics Enzymatic-Reactions Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway
20
20 SRI International Bioinformatics Need for Enzymatic-Reactions Reactions can have isozymes Enzymes can be multi-functional l Enzymatic-Reaction frames are needed to decouple the many-to-many relationships l Isozymes may have different inhibitors, etc. Gene-Reaction schema diagrams:
21
21 SRI International Bioinformatics New Representation of Regulation Previously, regulation was represented idiosyncratically: l One representation for modulation of enzymes l Completely different representation for regulation of transcription initiation Now unified under single Regulation class w/ subclasses This enables us to easily add support for new kinds of regulation, e.g. l Transcriptional attenuation (done) l Regulation of translation by small RNAs (in progress) New tools for display and editing of new Regulation classes
22
22 SRI International Bioinformatics Operons and Transcription Units Operon: A set of two or more genes that are transcribed as a unit. May include multiple promoters. Transcription Unit: A set of one or more genes that are transcribed as a unit from a single promoter. Pathway Tools schema does not represent operons explicitly, only transcription-units
23
23 SRI International Bioinformatics Ontology for Transcriptional Regulation trpLEDCBAp1 trpE trpD trpC trpB trpA trpL reg001 site001 TrpR*trp trpLEDCBA trp apoTrpR BR001 components left right regulated-by associated-binding-site regulator
24
24 SRI International Bioinformatics Representation of Transcriptional Regulation Transcription-Unit l Components include genes, a single promoter, zero or more terminators Binding-Sites l Linked to regulation frames Regulation frames l Transcriptional Initiation: defines a 3-way pairing between promoter, transcription factor and binding-site l Transcriptional Attenuation: defines relationship between terminator and the entity (tRNA, protein, small molecule) that regulates it.
25
25 SRI International Bioinformatics Infer Anti-Microbial Drug Targets Infer drug targets as genes coding for enzymes that encode chokepoint reactions Two types of chokepoint reactions: Genome Research 14:917 2004
26
26 SRI International Bioinformatics Reachability Analysis of Metabolic Network Given: l A PGDB for an organism l A set of initial metabolites Infer: l What set of products can be synthesized by the small- molecule metabolism of the organism Can known growth medium yield known essential compounds? Romero and Karp, Pacific Symposium on Biocomputing, 2001
27
27 SRI International Bioinformatics Algorithm: Forward Propagation Through Production System Each reaction becomes a production rule Each metabolite in nutrient set becomes an axiom Nutrient set Metabolite set “Fire” reactions Transport Products Reactants PGDB reaction pool
28
28 SRI International Bioinformatics
29
29 SRI International Bioinformatics Results Phase I: Forward propagation l 21 initial compounds yielded only half of the 41 essential compounds for E. coli Phase II: Manually identify l Bugs in EcoCyc (e.g., two objects for tryptophan) u A B B’ C l Incomplete knowledge of E. coli metabolic network u A + B C + D l “Bootstrap compounds” l Missing initial protein substrates (e.g., ACP) u Protein synthesis not represented Phase III: Forward propagation with 11 more initial metabolites l Yielded all 41 essential compounds
30
30 SRI International Bioinformatics Integration with other efforts Export of l BioPAX l SBML Import of l Enzyme DB (EC hierarchy of reactions) l GO l NCBI Taxonomy l BioPAX (work in progress)
31
31 SRI International Bioinformatics Near Future Signalling pathways l Validating the design Regulation l Small RNAs, and other additional types Higher Eukaryotes l Gene expression, Multiple splice forms l Cell types, localization
32
32 SRI International Bioinformatics Summary Pathway/Genome Databases l MetaCyc non-redundant DB of literature-derived pathways l 370 organism-specific PGDBs available through SRI at BioCyc.org l Computational theories of biochemical machinery Pathway Tools software l Extract pathways from genomes l Morph annotated genome into structured ontology l Distributed curation tools for MODs l Query, visualization, WWW publishing
33
33 SRI International Bioinformatics BioCyc and Pathway Tools Availability BioCyc.org Web site and database files freely available to all Pathway Tools freely available to non-profits l Macintosh, PC/Windows, PC/Linux References l Pathway Tools User’s Guide u Appendix A: Guide to the Pathway Tools Schema l Ontology Papers section of http://biocyc.org/publications.shtml http://biocyc.org/publications.shtml
34
34 SRI International Bioinformatics Acknowledgements SRI l Suzanne Paley, Ron Caspi, Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa EcoCyc Collaborators l Julio Collado-Vides, Robert Gunsalus, Ian Paulsen MetaCyc Collaborators l Sue Rhee, Peifen Zhang, Kate Dreher l Lukas Mueller, Anuradha Pujar Funding sources: l NIH National Center for Research Resources l NIH National Institute of General Medical Sciences l NIH National Human Genome Research Institute BioCyc.org Learn more from BioCyc webinars: biocyc.org/webinar.shtml
35
35 SRI International Bioinformatics BioWarehouse: A Bioinformatics Database Warehouse Peter D. Karp, Tom J. Lee, Valerie Wagner Oracle or MySQL UniProt ENZYME Genbank Taxonomy BioCyc CMR KEGG BioWarehouse = Java-based Loader = C-based Loader Oracle (10g) or MySQL (4.1.11) UniProt ENZYME Genbank Taxonomy BioCyc BioPAX BioWarehouse GO MAGE-ML KEGG CMR Eco2DBase BMC Bioinformatics 7:170 2006 bioinformatics.ai.sri.com/biowarehouse/
36
36 SRI International Bioinformatics Motivations Hundreds of bioinformatics DBs exist Important problems involve queries across multiple DBs
37
37 SRI International Bioinformatics Why is the Multidatabase Approach Alone Not Sufficient? Multidatabase query approaches assume databases are in a queryable DBMS Most sites that do operate DBMSs do not allow remote query access because of security and loading concerns Users want to control data stability Users want to control speed of their hardware Internet bandwidth limits query throughput Users need to capture, integrate and publish locally produced data of different types Multidatabase and Warehouse approaches complementary
38
38 SRI International Bioinformatics Key Challenges for BioWarehouse Designing a schema that accurately captures the contents of source DBs Designing a schema that is understandable and scalable Addressing poorly-specified syntax & semantics of source DBs Balancing the preservation of source data with mapping into common semantics
39
39 SRI International Bioinformatics Technical Approach Multi-platform support: Oracle (10g) and MySQL Schema support for multitude of bioinformatics datatypes Create loaders for public bioinformatics DBs l Parse file format of the source DB l Semantic transformations l Insert DB contents into warehouse tables Provide Warehouse query access mechanisms l SQL queries via ODBC, JDBC, OAA Operate public BioWarehouse server: publichouse BMC Bioinformatics 7:170 2006
40
40 SRI International Bioinformatics PublicHouse Server Publicly queryable BioWarehouse server operated by SRI Manages a set of biological DBs constructed using BioWarehouse l CMR l Open BioCyc DBs l ENZYME l NCBI Taxonomy l UniProt Large-scale data mining using l Dashboard Warehouse Query Analyzer l MySQL client command line See: http://bioinformatics.ai.sri.com/biowarehouse/publichouse.html Host: publichouse.sri.com Port: 3306 Database: biospice
41
41 SRI International Bioinformatics BioWarehouse Schema Manages many bioinformatics datatypes simultaneously l Pathways, Reactions, Chemicals l Proteins, Genes, Replicons l Sequences, Sequence Features l Organisms, Taxonomic relationships l Computations (sequence matches) l Citations, Controlled vocabularies l Links to external databases l Gene expression datasets l Protein-protein interactions datasets l Flow cytometry datasets Each type of warehouse object implemented through one or more relational tables (currently ~150)
42
42 SRI International Bioinformatics Warehouse Schema Manages multiple datasets simultaneously l Dataset = Single version of a database Version comparison Multiple software tools or experiments that require access to different versions Each dataset is a warehouse entity Every warehouse object is registered in a dataset
43
43 SRI International Bioinformatics Warehouse Schema Different databases storing the same biological datatypes are coerced into same warehouse tables Design of most datatypes inspired by multiple databases Representational tricks to decrease schema bloat l Single space of primary keys l Single set of satellite tables such as for synonyms, citations, comments, etc.
44
44 SRI International Bioinformatics Acknowledgements SRI l Suzanne Paley, Ron Caspi, Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa EcoCyc Collaborators l Julio Collado-Vides, Robert Gunsalus, Ian Paulsen MetaCyc Collaborators l Sue Rhee, Peifen Zhang, Kate Dreher l Lukas Mueller, Anuradha Pujar Funding sources: l NIH National Center for Research Resources l NIH National Institute of General Medical Sciences l NIH National Human Genome Research Institute BioCyc.org Learn more from BioCyc webinars: biocyc.org/webinar.shtml
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.