Download presentation
Presentation is loading. Please wait.
Published byNathan Butler Modified over 9 years ago
1
Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International paley@ai.sri.com http://BioCyc.org/
2
SRI International Bioinformatics Motivation: Theories of Cellular Function Too Large for One Mind to Grasp Example: E. coli metabolic network l 160 pathways involving 744 reactions and 791 substrates Example: E. coli genetic network l Control by 97 transcription factors of 1174 genes in 630 transcription units Past solutions: l Partition theories across multiple minds l Encode theories in natural-language text We cannot compute with theories in those forms l Evaluate theories for consistency with new data: microarrays l Refine theories with respect to new data l Compare theories describing different organisms
3
SRI International Bioinformatics Solution: Biological Knowledge Bases Store biological knowledge and theories in computers in a declarative form l Amenable to computational analysis and generative user interfaces Establish ongoing efforts to curate (maintain, refine, embellish) these knowledge bases A high quality comprehensive knowledge base enables us to ask and answer important new questions
4
SRI International Bioinformatics Terminology Model Organism Database (MOD) – DB describing genome and other information about an organism Pathway/Genome Database (PGDB) – MOD that combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors, promoters, operons, DNA binding sites BioCyc – Collection of 15 PGDBs at BioCyc.org l EcoCyc, AgroCyc, HumanCyc
5
SRI International Bioinformatics Pathway Tools Software PathoLogic l Prediction of metabolic network from genome l Computational creation of new Pathway/Genome Databases Pathway/Genome Editors l Distributed curation of genome annotations l Distributed object database system l Interactive editing tools Pathway/Genome Navigator l WWW publishing of PGDBs l Graphic depictions of pathways, chromosomes, operons l Analysis operations u Pathway visualization of gene-expression data u Global comparisons of metabolic networks
6
SRI International Bioinformatics Pathway Tools Software Pathway/ Genome Databases Pathway/Genome Navigator PathoLogic Pathway Predictor Pathway/ Genome Editors
7
SRI International Bioinformatics Pathway/Genome Database Chromosomes, Plasmids Genes Proteins Reactions Pathways Compounds CELL Operons, Promoters, DNA Binding Sites
8
SRI International Bioinformatics Pathway Tools Algorithms Visualization and editing tools for following datatypes Full Metabolic Map l Paint gene expression data on metabolic network; compare metabolic networks Pathways l Pathway prediction Reactions l Balance checker Compounds l Chemical substructure comparison Enzymes, Transporters, Transcription Factors Genes Chromosomes Operons l Operon prediction; visualize genetic network
9
SRI International Bioinformatics Definitions Chemical reactions interconvert chemical compounds An enzyme is a protein that accelerates chemical reactions A pathway is a linked set of reactions l Often regulated as a unit l A conceptual unit of cell’s biochemical machine A + B C + D A C E
10
SRI International Bioinformatics
11
SRI International Bioinformatics
12
SRI International Bioinformatics
13
SRI International Bioinformatics
14
SRI International Bioinformatics
15
SRI International Bioinformatics
16
SRI International Bioinformatics
17
SRI International Bioinformatics
18
SRI International Bioinformatics
19
SRI International Bioinformatics
20
SRI International Bioinformatics Operations of the Metabolic Overview Find pathways, compounds Find reactions l By enzyme name, EC number, substrates, modulation l All with isozymes l All occurring in multiple pathways l By EC class, pathway class Find genes l By name, gene class l All regulated by transcriptional regulator protein
21
SRI International Bioinformatics Metabolic Overview Queries Species comparison l Highlight reactions that are u Shared/not-shared with u Any-one/All-of u A specified set of species Overlay expression data l Colors reflects expression level and are user-configurable l Can show single experiment or animated time series
22
SRI International Bioinformatics EcoCyc Project E. coli Encyclopedia l Model-Organism Database for E. coli l Began in 1992 as collaboration between Karp and Riley l Over 3500 literature citations Collaborative development via Internet l Karp (SRI) -- Bioinformatics architect l John Ingraham -- Advisor l (SRI) Metabolic pathways l Saier (UCSD) and Paulsen (TIGR)-- Transport l Collado (UNAM)-- Regulation of gene expression Ontology: 1000 biological classes Database content: 17,700 instances
23
SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,393 Proteins: 4,273 Reactions: 2,760 Pathways: 165 Compounds: 774 http://BioCyc.org/ Transcription Units: 724 Factors: 110 Enzymes: 914 Transporters: 162 Promoters: 812 TransFac Sites: 956 Citations: 3,508
24
SRI International Bioinformatics MetaCyc: Metabolic Encyclopedia Nonredundant metabolic pathway database Describe a representative sample of every experimentally determined metabolic pathway Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates 460 pathways, 1267 enzymes, 4294 reactions l 172 E. coli pathways, 2735 citations Nucleic Acids Research 30:59-61 2002. Jointly developed by SRI and Carnegie Institution l New focus on plant pathways
25
SRI International Bioinformatics MetaCyc Data MetaCyc contains one DB object for each distinct pathway l Distinct in terms of reaction steps l Each pathway labeled with species it occurs in MetaCyc pathways are experimentally determined 4218 reactions in MetaCyc l 401 lack EC numbers
26
SRI International Bioinformatics MetaCyc Enzyme Data Reaction(s) catalyzed Alternative substrates Cofactors / prosthetic groups Activators and inhibitors Subunit structure Molecular weight, pI Comment, literature citations Species
27
SRI International Bioinformatics MetaCyc Frequent Organisms Escherichia coli156 Arabidopsis thaliana47 Homo sapiens30 Pseudomonas21 Bacillus subtilis20 Salmonella typhimurium20 Sulfolobus solfataricus18 Pseudomonas putida14 Saccharomyces cerevisiae14 Haemophilus influenzae13 Glycine max11 Deinococcus radiourans10
28
SRI International Bioinformatics EcoCyc and MetaCyc Review level databases Data derived primarily from biomedical literature l Manual entry by staff curators l Updates by staff curators only Data validation l Consistency constraints l Lisp programs that verify other semantic relationships u Unbalanced chemical reactions
29
SRI International Bioinformatics Computationally-Derived PGDBs Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs Gene Products DNA Sequences Reactions Pathways Compounds Multi-organism Pathway Database (MetaCyc) PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Genomic Map Genes Gene Products Reactions Pathways Compounds
30
SRI International Bioinformatics PathoLogic Input/Output Inputs: l File listing genetic elements u http://bioinformatics.ai.sri.com/ptools/genetic-elements.dat l Files containing DNA sequence for each genetic element l Files containing annotation for each genetic element l MetaCyc database Output: l Pathway/genome database for the subject organism l Directory tree for the subject organism l Reports that summarize: u Evidence contained in the input genome for the presence of reference pathways u Reactions missing from inferred pathways
31
SRI International Bioinformatics PathoLogic Functionality Initialize schema for new PGDB Transform existing genome to PGDB form Infer metabolic pathways and store in PGDB Infer operons and store in PGDB Assist user with manual tasks l Assign enzymes to reactions they catalyze l Identify false-positive pathway predictions l Build protein complexes from monomers l Assemble Overview diagram
32
SRI International Bioinformatics BioCyc Collection of Pathway/Genome DBs Literature-based Datasets: Escherichia coli (EcoCyc) MetaCyc PGDBs at other sites: Arabidopsis thaliana (TAIR) Methanococcus jannaschii (EBI) Saccharomyces cerevisiae (SGD) Synechocystis PCC6803 Computationally-derived datasets: Agrobacterium tumefaciens Caulobacter crescentus Chlamydia trachomatis Bacillus subtilis Helicobacter pylori Haemophilus influenzae Homo sapiens Mycobacterium tuberculosis RvH37 Mycobacterium tuberculosis CDC1551 Mycoplasma pneumonia Pseudomonas aeruginosa Treponema pallidum Vibrio cholerae Yellow = Open Database http://BioCyc.org/
33
SRI International Bioinformatics HumanCyc: Human Metabolic Pathway Database PGDB of human metabolic pathways built using PathoLogic Contains information on 28,700 genes, their products, and the metabolic reactions and pathways they catalyze (no signalling pathways) Chromosome and contigs from Ensembl Human genetic loci from LocusLink Mitochondrion data from GenBank Ensembl and LocusLink gene entries were merged to eliminate redundancies where possible. Contains links to human genome web sites Plan to hire one curator to refine and curate with respect to literature over a 2 year period l Remove false-positive predictions l Insert known pathways missed by PathoLogic l Add comments and citations from pathways and enzymes to the literature l Add enzyme activators, inhibitors, cofactors, tissue information Funded by commercial consortium
34
SRI International Bioinformatics BioCyc and Pathway Tools Availability WWW BioCyc freely available to all l BioCyc.org l Six BioCyc DBs openly available to all BioCyc DBs freely available to non-profits l Flatfiles downloadable from BioCyc.org l Binary executable: u Sun UltraSparc-170 w/ 64MB memory u PC, 400MHz CPU, 64MB memory, Windows-98 or newer l PerlCyc API Pathway Tools freely available to non-profits
35
SRI International Bioinformatics Information Sources Pathway Tools User’s Guide l aic-export/ecocyc/genopath/released/doc/userguide1.pdf u Pathway/Genome Navigator u Appendix A: Guide to the Pathway Tools Schema l aic-export/ecocyc/genopath/released/doc/userguide2.pdf u PathoLogic, Editing Tools Pathway Tools Web Site l http://bioinformatics.ai.sri.com/ptools/ http://bioinformatics.ai.sri.com/ptools/ l Publications, programming examples, etc. Pathway Tools Tutorial l http://bioinformatics.ai.sri.com/ptools/tutorial/ http://bioinformatics.ai.sri.com/ptools/tutorial/
36
SRI International Bioinformatics Pathway Tools Implementation Details Allegro Common Lisp Sun and PC platforms Ocelot object database 250,000 lines of code Lisp-based WWW server at BioCyc.org l Manages 15 PGDBs
37
SRI International Bioinformatics Frame Data Model Frame Data Model -- organizational structure for a PGDB Knowledge base (KB, Database, DB) Frames Slots
38
SRI International Bioinformatics Knowledge Base Collection of frames and their associated slots, values, facets, and annotations AKA: Database, PGDB Can be stored within l An Oracle DB l A disk file l A Pathway Tools binary program
39
SRI International Bioinformatics Frames Entities with which facts are associated Kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle Classes: l Superclass(es) l Subclass(es) l Instance(s) A symbolic frame name (id, key) uniquely identifies each frame
40
SRI International Bioinformatics Slots Encode attributes/properties of a frame l Integer, real number, string Represent relationships between frames l The value of a slot is the identifier of another frame Every slot is described by a “slot frame” in a KB that defines meta information about that slot
41
SRI International Bioinformatics Properties of Slots Number of values l Single valued l Multivalued: sets, bags Slot values l Any LISP object: Integer, real, string, symbol (frame name) Slotunits define properties of slots: datatypes, classes, constraints Two slots are inverses if they encode opposite relationships l Slot Product in class Genes l Slot Gene in class Polypeptides
42
SRI International Bioinformatics Pathway Tools Ontology 1064 classes l Main classes such as: u Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) l Taxonomies for Pathways, Reactions, Compounds 205 slots l Meta-data: Creator, Creation-Date l Comment, Citations, Common-Name, Synonyms l Attributes: Molecular-Weight, DNA-Footprint-Size l Relationships: Catalyzes, Component-Of, Product Classes, instances, slots all stored side by side in DBMS, share a single namespace
43
SRI International Bioinformatics Slot Links from Gene to Pathway Frame Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway Chrom succinate FAD fumarate FADH 2 left right
44
SRI International Bioinformatics Enzymatic-reaction frame stores properties of pairing between enzyme and reaction Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle EC# K eq Cofactors Inhibitors Molecular wt pI Left-end-position
45
SRI International Bioinformatics Monofunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway
46
SRI International Bioinformatics Bifunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway Reaction Enzymatic-reaction
47
SRI International Bioinformatics Monofunctional Multimer Monomer Gene Reaction Enzymatic-reaction Multimer Pathway
48
SRI International Bioinformatics Pathway and Substrates Reactant-1 Reaction Pathway Reaction Reactant-2 Product-2 Product-1 in-pathway left right
49
SRI International Bioinformatics Genetic Network Representation Describe biological entities involved in control of transcription initiation l Promoters, operators, transcription factors, operons, terminators Describe molecular interactions among these entities l Modulation of transcription factor activity l Binding of transcription factors to DNA binding sites l Effects on transcription initiation
50
SRI International Bioinformatics Ontology for Transcriptional Regulation One DB object defined for each biological entity and for each molecular interaction site001 pro001 trpE trpD trpC trpB trpA trpL Int002RpoSig70 TrpR*trpInt001 trpLEDCBA trp apoTrpR Complexation reaction Int001 (binding of TrpR*trp to site001) inhibits Int002 (binding of RNA Polymerase to promoter) and consequently prevents transcription of genes in transcription unit.
51
SRI International Bioinformatics Principle Classes Class names are capitalized, plural Genetic-Elements, with subclasses: l Chromosomes l Plasmids Genes Transcription-Units RNAs Proteins, with subclasses: l Polypeptides l Protein-Complexes
52
SRI International Bioinformatics Principle Classes Reactions, with subclasses: l Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements
53
SRI International Bioinformatics Slots in Multiple Classes Common-Name Synonyms Names (computed as union of Common-Name, Synonyms) Comment Citations DB-Links
54
SRI International Bioinformatics Genes Slots Chromosome Left-End-Position Right-End-Position Centisome-Position Transcription-Direction Product
55
SRI International Bioinformatics Proteins Slots Molecular-Weight-Seq Molecular-Weight-Exp pI Locations Modified-Form Unmodified-Form Component-Of
56
SRI International Bioinformatics Polypeptides Slots Gene
57
SRI International Bioinformatics Protein-Complexes Slots Components
58
SRI International Bioinformatics Reactions Slots EC-Number Left, Right Substrates (computed as union of Left, Right) Enzymatic-Reaction DeltaG0 Spontaneous?
59
SRI International Bioinformatics Enzymatic-Reactions Slots Enzyme Reaction Activators Inhibitors Physiologically-Relevant Cofactors Prosthetic-Groups Alternative-Substrates Alternative-Cofactors Reaction-direction
60
SRI International Bioinformatics Pathways Slots Reaction-List Predecessors Primaries
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.