Download presentation
Presentation is loading. Please wait.
Published byClaribel Simon Modified over 9 years ago
1
Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International pkarp@ai.sri.com
2
Pathway/Genome Database Integrating Genomic and Biochemical Data Chromosomes, Plasmids Genes Proteins Reactions Pathways Compounds CELL Operons, Promoters, DNA Binding Sites
3
Key Functionality Pathway analysis l Prediction of pathways from genomes l Comparative pathway analysis Ongoing curation of PGDBs WWW publishing of PGDBs Analysis of gene expression data
4
Tools and Datasets PGDB PathwaysGenes Pathway/Genome Navigator PathoLogic Editors Create PGDBs Visualize, Query and Analyze PGDBs Update PGDBs
5
PathoLogic Pathway Predictor New PGDB Set of Annotated Genes Pathway Prediction MetaCyc PGDB Reports
6
Prediction of Pathways from Genomes Pathways Compounds Genomic Map Genes Proteins Reactions Metabolic Network Pathway/Genome Database DNA Sequence List of Genes/ORFs List of Gene Products Annotated Genome PathoLogic
7
MetaCyc Overview Meta Metabolic Encyclopedia 439 pathways, 1095 enzymes, 4217 reactions l 173 E. coli pathways Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Editor in chief: Dr. Monica Riley
8
Pathway/Genome Navigator Query and visualization tools for PGDBs l Metabolic pathways, reactions, compounds l Enzymes, transporters, transcription factors l Genome maps, genes, operons, promoters, DNA sites l Retrieve nucleotide and DNA sequences l Perform Blast searches Runs as an application on Solaris, Windows Runs as a WWW server on Solaris Query and comparative analysis functions
9
Interactive Editing Tools Pathway editor Reaction editor Gene editor Enzyme editor Compound editor Transcription Unit Editor Facilitate updates to PGDBs l Improved computational predictions l Literature-based data Record citations, comments, evidence, history
10
Pathway Views of Expression Data Import gene expression data Compute expression ratios Obtain pathway based visualizations of data l Numerical spectrum of expression values mapped to a color spectrum l Steps of overview painted with color corresponding to expression level(s) of genes that encode enzyme(s) for that step l Absolute or relative expression values
11
Environment for Computational Exploration of Genomes Powerful ontology opens many facets of the biology to computational exploration Global characterization of metabolic network Analysis of interface between transport and metabolism Nutrient analysis of metabolic network
12
PathoLogic Pathway Predictor
13
Pathologic Pathway Predictor Introduction Description of PPP execution Inputs to PPP Using the GUI to create a pathway/genome database Output from PPP Caveats
14
PathoLogic Goals Create the set of class frames that encode DB schema l Copied from MetaCyc Create the appropriate set of instance frames l Genes, genetic elements, proteins created from input files l Substrates, reactions, and pathways are copied from the reference database Interconnect frames in a manner that accurately reflects their semantic relationships
15
PathoLogic Input/Output Inputs: l File listing genetic elements u http://bioinformatics.ai.sri.com/ptools/genetic-elements.dat l Files containing DNA sequence for each genetic element l Files containing annotation for each genetic element l MetaCyc database Output: l Pathway/genome database for the subject organism l Directory tree for the subject organism l Reports that summarize: u Evidence contained in the input genome for the presence of reference pathways u Reactions missing from inferred pathways
16
Inputs to PathoLogic Pathway Predictor genetic-elements.dat Sequence files GenBank file format PathoLogic format Directory Structure
17
genetic-elements.dat ID TEST-CHROM-1 NAME Chromosome 1 TYPE :CHRSM CIRCULAR? N ANNOT-FILE chrom1.pf SEQ-FILE chrom1.fsa // ID TEST-CHROM-2 NAME Chromosome 2 CIRCULAR? N ANNOT-FILE /mydata/chrom2.gbk SEQ-FILE /mydata/chrom2.fna //
18
File Naming Conventions One pair of sequence and annotation files for each genetic element Sequence files: FASTA format l suffix fsa or fna Annotation file: l Genbank format: suffix.gbk l PathoLogic format: suffix.pf
19
GenBank File Format Accepted feature types: l CDS, tRNA, rRNA, misc_RNA Accepted qualifiers: l /labelUnique ID [recm] l /geneGene name [req] l /product [req] l /EC_number [recm] l /product_comment [opt] l /gene_comment [opt] l /alt_nameSynonyms [opt] For multifunctional proteins, put each function in a separate /product line
20
Typical Problems Using Genbank Files With PathoLogic Wrong qualifier names used Extraneous information in a given qualifier Check results of trial parse carefully
21
PathoLogic File Format Each record starts with line containing an ID attribute Tab delimited Each record ends with a line containing // One attribute-value pair is allowed per line l Use multiple FUNCTION lines for multifunctional proteins Lines starting with ‘;’ are comment lines Valid attributes are: l ID, NAME, SYNONYM l STARTBASE, ENDBASE, GENE-COMMENT l FUNCTION, PRODUCT-TYPE, EC, FUNCTION-COMMENT l DBLINK
22
PathoLogic File Format IDTP0734 NAMEdeoD STARTBASE799084 ENDBASE799785 FUNCTIONpurine nucleoside phosphorylase DBLINK PID:g3323039 PRODUCT-TYPE P GENE-COMMENTsimilar to GP:1638807 percent identity: 57.51; identified by sequence similarity; putative // IDTP0735 NAMEgltA STARTBASE799867 ENDBASE801423 FUNCTIONglutamate synthase DBLINK PID:g3323040 PRODUCT-TYPE P
23
Using the PPP GUI to Create a Pathway/Genome Database Input Project Information l Organism -> Create New Trial Parse l Build -> Trial Parse Build pathway/genome database l Build -> Automated Build Manual polishing l Refine -> Resolve Ambiguous Name Matches l Refine -> Assign Modified Proteins l Refine -> Create Protein Complexes l Refine -> Run Consistency Checker l Refine -> Update Overview
24
PathoLogic Command Menus Organism l Select l Create New l Save KB l Revert KB l Reinitialize KB l Exit Build l Trial Parse l Automated Build Refine l Resolve Ambiguous Name Matches l Assign Modified Proteins l Create Protein Complexes l Re-run Name Matcher l Rescore Pathways l Run Consistency Checker l Update Overview
25
Input Project Information
26
PathoLogic PP Parse Output
27
Enzyme Name to Reaction Mapping
28
Enzyme Name Matching Tool Dictionary of enzyme names assembled from: l All metabolic reactions found in MetaCyc l Two files that map synonyms not found in MetaCyc to reaction names: u System file (pangea-enzyme-mappings.dat) u User-supplied file (local-enzyme-mappings.dat) Location of sources: l $GPROOT/pathologic/$VERSION-NUMBER/data
29
Enzyme Name Matcher Matches on full enzyme name Match is case-insensitive and removes the punctuation characters “ -_(){}',:” Also matches after removal of prefixes and suffixes such as: l “Putative”, “Hypothetical”, etc l alpha|beta|…|catalytic|inducible chain|subunit|component l Parenthetical gene name
30
Enzyme Name Matcher For names that do not match, software identifies probable metabolic enzymes as those l Containing “ase” l Not containing keywords such as u “sensor kinase” u “topoisomerase” u “protein kinase” u “peptidase” u Etc Research unknown enzymes l MetaCyc, Swiss-Prot, PIR, Medline, EMP
31
Assigning Evidence Scores to Predicted Pathways X|Y|Z denotes score for P in O l where: u X = total number of reactions in P u Y = enzymes catalyzing number of reactions for which there is evidence in O u Z = number of Y reactions that are used in other pathways in O Not clear how to convert these scores into a probability of occurrence
32
Algorithm for Automated Pathway Pruning A pathway will never be pruned if it contains a unique enzyme – an enzyme not present in any other pathway A pathway will be pruned if one of the following conditions holds: l Evidence is better for a different pathway in same variant set l Evidence for only one reaction in pathway, or l Its set of reactions present is a proper subset of the reactions present in some other pathway, and u If pathway is a biosynthetic pathway, final reaction(s) missing u If pathway is a degradation pathway, initial reaction(s) missing u If pathway is an energy metabolism pathway, more than half the reactions are missing
33
Creating Protein Complexes
34
Complex Subunits Stoichiometries
35
Proteins as Reaction Substrates
36
Manual Pruning of Pathways Use pathway evidence report l Coloring scheme aids in assessing pathway evidence Phase I: Prune extra variant pathways Rescore pathways, re-generate pathway evidence report Phase II: Prune pathways unlikely to be present l No/few unique enzymes l Most pathway steps present because they are used in another pathway l Pathway very unlikely to be present in this organism
37
Overview Graph
38
Output from PPP Pathway/genome database Summary pages l Pathway evidence page u Click “Summary of Organisms”, then click organism name, then click “Pathway Evidence”, then click “Save Pathway Report” l Missing enzymes report Directory tree containing sequence files, reports, etc.
39
Resulting Directory Structure ROOT/aic-export/ecocyc/ORGIDcyc/VERSION/ l input u organism.dat u organism-init.dat u genetic-elements.dat u annotations files u sequence files l reports u name-matching-report.txt u trial-parse-report.txt l kb u ORGIDbase.ocelot l data u overview.graph l released -> VERSION
40
Caveats Cannot predict pathways not present in MetaCyc Evidence for short pathways is hard to interpret Since many reactions occur in lots of pathways, many false positives
41
The Pathway Tools Schema
42
Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB When writing Lisp complex queries to PGDBs, those queries must name classes and slots within the schema A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity
43
Reference Pathway Tools User’s Guide, Volume I l Appendix A: Guide to the Pathway Tools Schema
44
Web of Relationships for One Enzyme Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle
45
Frame Data Model and Schema Frame Data Model -- organizational principle for a DB Object Displays Schema l Gene slots l Polypeptide slots l Protein slots l Protein Complex slots l Reaction slots l Enzymatic Reaction slots
46
Frame Data Model Knowledge base (KB, Database, DB) Frames Slots Facets Annotations
47
Knowledge Base Collection of frames and their associated slots, values, facets, and annotations Can be stored within l An Oracle DB l A disk file l A Pathway Tools binary program
48
Frames Entities with which facts are associated Kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle Classes: l Superclass(es) l Subclass(es) l Instance(s) A symbolic frame name (id, key) uniquely identifies each frame
49
Slots Encode attributes/properties of a frame l Integer, real number, string Represent relationships between frames l The value of a slot is the identifier of another frame Every slot is described by a “slot frame” in a KB that defines meta information about that slot
50
Slot Links Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway
51
Slots Number of values l Single valued l Multivalued: sets, bags Slot values l Any LISP object: Integer, real, string, symbol (frame name), list Slotunits define properties of slots: datatypes, classes, constraints Two slots are inverses if they encode opposite relationships l Slot Product in class Genes l Slot Gene in class Polypeptides
52
Representation of Function Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle EC# K eq Cofactors Inhibitors Molecular wt pI Left-end-position
53
Monofunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway
54
Bifunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway Reaction Enzymatic-reaction
55
Monofunctional Multimer Monomer Gene Reaction Enzymatic-reaction Multimer Pathway
56
Pathway and Substrates Reactant-1 Reaction Pathway Reaction Reactant-2 Product-2 Product-1 in-pathway left right
57
Transcriptional Regulation site001 pro001 trpE trpD trpC trpB trpA trpL Int003RpoSig70 TrpR*trpInt001 trpLEDCBA trp apoTrpR Int005
58
Annotations Encode information about individual slot values Used to attach comments and citations to slot values Example: l Frame tryptophan-synthetase has a slot called Molecular- Weight with a value of 28 l Attached to that value is an annotation whose label is Citation and whose value is “[3444332]”
59
Facets Encode information about slots Allow association between a slot and: l comments l citations Example: Comment attached to Inhibitors of EnzRxn Allow access to schema information
60
Principle Classes Class names are capitalized, plural Genetic-Elements, with subclasses: l Chromosomes l Plasmids Genes Transcription-Units RNAs Proteins, with subclasses: l Polypeptides l Protein Complexes
61
Principle Classes Reactions, with subclasses: l Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements
62
Slots in Multiple Classes Common-Name Synonyms Names (computed as union of Common-Name, Synonyms) Comment Citations DB-Links
63
Genes Slots Chromosome Left-End-Position Right-End-Position Centisome-Position Transcription-Direction Product
64
Proteins Slots Molecular-Weight-Seq Molecular-Weight-Exp pI Locations Modified-Form Unmodified-Form Component-Of
65
Polypeptides Slots Gene
66
Protein-Complexes Slots Components
67
Reactions Slots EC-Number Left, Right Substrates (computed as union of Left, Right) DeltaG0 Keq Spontaneous? Species
68
Enzymatic-Reactions Slots Enzyme Reaction Activators Inhibitors Physiologically-Relevant Cofactors Prosthetic-Groups Alternative-Substrates Alternative-Cofactors
69
Editing Pathway/Genome Databases
70
Pathway Tools Paradigm Separate database from user interface Navigator provides one view of the DB Editors provide an alternative view of the DB
71
Invoking the Editors Right-Click on an Object Handle l Edit l Notes l Show Shift-Middle-Click on an Object Handle
72
Saving Changes The user must save changes explicitly with Save KB To discard changes made since last save l Special -> KB -> Revert KB
73
Administering the Pathway Tools
74
Information Sources Pathway Tools User’s Guide l aic-export/ecocyc/genopath/released/doc/userguide1.pdf u Appendix A: Guide to the Pathway Tools Schema l aic-export/ecocyc/genopath/released/doc/userguide2.pdf Pathway Tools Web Site l http://bioinformatics.ai.sri.com/ptools/ http://bioinformatics.ai.sri.com/ptools/ Pathway Tools Tutorial l http://bioinformatics.ai.sri.com/ptools/tutorial/ http://bioinformatics.ai.sri.com/ptools/tutorial/
75
Reporting Problems E-mail to ptools-support@ai.sri.comptools-support@ai.sri.com Include: l Error message l Result of :zoom :count :all l What version and platform you are running l What operation were you performing when the error occurred?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.