Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

The Pathway/Genome Navigator (These slides are a guide as you experiment with the Navigator)
1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
SRI International Bioinformatics Comparative Analysis Q
Overview of the Pathway Tools Software and Pathway/Genome Databases.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of a different aspect of the cellular.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International.
PathoLogic Pathway Predictor. SRI International Bioinformatics Inference of Metabolic Pathways Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Overviews, Omics Viewers, and Object Groups. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery.
Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery l Cellular Overview.
Copyright OpenHelix. No use or reproduction without express written consent1.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
The Pathway/Genome Navigator (These slides are a guide as you experiment with the Navigator)
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
PathoLogic Pathway Predictor
Welcome to DNA Subway Classroom-friendly Bioinformatics.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
SRI International Bioinformatics 1 Regulation in Pathway Tools Pathway Tools Workshop August 2009.
The Pathway/Genome Navigator. SRI International Bioinformatics Overview Data page types General query strategies Web queries Desktop Pathway Tools User.
Writing Programs that Analyze Pathway/Genome Databases Markus Krummenacker Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
Editing Pathway/Genome Databases Compounds, Reactions and Pathways Ron Caspi.
SRI International Bioinformatics Update your computers! To install a patch: Tools => Instant Patch => Download and Activate All Patches.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Reconstructing the metabolic network of a bacterium from its genome: the construction of LacplantCyc Christof Francke In silico reconstruction of the metabolic.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Mario Latendresse Bioinformatics Research Group SRI International April.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
The Pathway/Genome Navigator
PathoLogic Pathway Predictor
Editing Pathway/Genome Databases
Comparative Analysis in BioCyc
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
The Pathway Tools Schema
PathoLogic: More about Matching Enzyme Names to Reactions
How to Administer a PGDB
Comparative Analysis Q
Overview of Microbial Pathway and Genome Databases
Incremental PathoLogic
Propagating Changed Annotation and Pathway Information
Annotation Presentation
Advanced PGDB Editing: Gene Ontology (GO) Terms
Presentation transcript:

Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International

Pathway/Genome Database Integrating Genomic and Biochemical Data Chromosomes, Plasmids Genes Proteins Reactions Pathways Compounds CELL Operons, Promoters, DNA Binding Sites

Key Functionality Pathway analysis l Prediction of pathways from genomes l Comparative pathway analysis Ongoing curation of PGDBs WWW publishing of PGDBs Analysis of gene expression data

Tools and Datasets PGDB PathwaysGenes Pathway/Genome Navigator PathoLogic Editors Create PGDBs Visualize, Query and Analyze PGDBs Update PGDBs

PathoLogic Pathway Predictor New PGDB Set of Annotated Genes Pathway Prediction MetaCyc PGDB Reports

Prediction of Pathways from Genomes Pathways Compounds Genomic Map Genes Proteins Reactions Metabolic Network Pathway/Genome Database DNA Sequence List of Genes/ORFs List of Gene Products Annotated Genome PathoLogic

MetaCyc Overview Meta Metabolic Encyclopedia 439 pathways, 1095 enzymes, 4217 reactions l 173 E. coli pathways Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Editor in chief: Dr. Monica Riley

Pathway/Genome Navigator Query and visualization tools for PGDBs l Metabolic pathways, reactions, compounds l Enzymes, transporters, transcription factors l Genome maps, genes, operons, promoters, DNA sites l Retrieve nucleotide and DNA sequences l Perform Blast searches Runs as an application on Solaris, Windows Runs as a WWW server on Solaris Query and comparative analysis functions

Interactive Editing Tools Pathway editor Reaction editor Gene editor Enzyme editor Compound editor Transcription Unit Editor Facilitate updates to PGDBs l Improved computational predictions l Literature-based data Record citations, comments, evidence, history

Pathway Views of Expression Data Import gene expression data Compute expression ratios Obtain pathway based visualizations of data l Numerical spectrum of expression values mapped to a color spectrum l Steps of overview painted with color corresponding to expression level(s) of genes that encode enzyme(s) for that step l Absolute or relative expression values

Environment for Computational Exploration of Genomes Powerful ontology opens many facets of the biology to computational exploration Global characterization of metabolic network Analysis of interface between transport and metabolism Nutrient analysis of metabolic network

PathoLogic Pathway Predictor

Pathologic Pathway Predictor Introduction Description of PPP execution Inputs to PPP Using the GUI to create a pathway/genome database Output from PPP Caveats

PathoLogic Goals Create the set of class frames that encode DB schema l Copied from MetaCyc Create the appropriate set of instance frames l Genes, genetic elements, proteins created from input files l Substrates, reactions, and pathways are copied from the reference database Interconnect frames in a manner that accurately reflects their semantic relationships

PathoLogic Input/Output Inputs: l File listing genetic elements u l Files containing DNA sequence for each genetic element l Files containing annotation for each genetic element l MetaCyc database Output: l Pathway/genome database for the subject organism l Directory tree for the subject organism l Reports that summarize: u Evidence contained in the input genome for the presence of reference pathways u Reactions missing from inferred pathways

Inputs to PathoLogic Pathway Predictor genetic-elements.dat Sequence files GenBank file format PathoLogic format Directory Structure

genetic-elements.dat ID TEST-CHROM-1 NAME Chromosome 1 TYPE :CHRSM CIRCULAR? N ANNOT-FILE chrom1.pf SEQ-FILE chrom1.fsa // ID TEST-CHROM-2 NAME Chromosome 2 CIRCULAR? N ANNOT-FILE /mydata/chrom2.gbk SEQ-FILE /mydata/chrom2.fna //

File Naming Conventions One pair of sequence and annotation files for each genetic element Sequence files: FASTA format l suffix fsa or fna Annotation file: l Genbank format: suffix.gbk l PathoLogic format: suffix.pf

GenBank File Format Accepted feature types: l CDS, tRNA, rRNA, misc_RNA Accepted qualifiers: l /labelUnique ID [recm] l /geneGene name [req] l /product [req] l /EC_number [recm] l /product_comment [opt] l /gene_comment [opt] l /alt_nameSynonyms [opt] For multifunctional proteins, put each function in a separate /product line

Typical Problems Using Genbank Files With PathoLogic Wrong qualifier names used Extraneous information in a given qualifier Check results of trial parse carefully

PathoLogic File Format Each record starts with line containing an ID attribute Tab delimited Each record ends with a line containing // One attribute-value pair is allowed per line l Use multiple FUNCTION lines for multifunctional proteins Lines starting with ‘;’ are comment lines Valid attributes are: l ID, NAME, SYNONYM l STARTBASE, ENDBASE, GENE-COMMENT l FUNCTION, PRODUCT-TYPE, EC, FUNCTION-COMMENT l DBLINK

PathoLogic File Format IDTP0734 NAMEdeoD STARTBASE ENDBASE FUNCTIONpurine nucleoside phosphorylase DBLINK PID:g PRODUCT-TYPE P GENE-COMMENTsimilar to GP: percent identity: 57.51; identified by sequence similarity; putative // IDTP0735 NAMEgltA STARTBASE ENDBASE FUNCTIONglutamate synthase DBLINK PID:g PRODUCT-TYPE P

Using the PPP GUI to Create a Pathway/Genome Database Input Project Information l Organism -> Create New Trial Parse l Build -> Trial Parse Build pathway/genome database l Build -> Automated Build Manual polishing l Refine -> Resolve Ambiguous Name Matches l Refine -> Assign Modified Proteins l Refine -> Create Protein Complexes l Refine -> Run Consistency Checker l Refine -> Update Overview

PathoLogic Command Menus Organism l Select l Create New l Save KB l Revert KB l Reinitialize KB l Exit Build l Trial Parse l Automated Build Refine l Resolve Ambiguous Name Matches l Assign Modified Proteins l Create Protein Complexes l Re-run Name Matcher l Rescore Pathways l Run Consistency Checker l Update Overview

Input Project Information

PathoLogic PP Parse Output

Enzyme Name to Reaction Mapping

Enzyme Name Matching Tool Dictionary of enzyme names assembled from: l All metabolic reactions found in MetaCyc l Two files that map synonyms not found in MetaCyc to reaction names: u System file (pangea-enzyme-mappings.dat) u User-supplied file (local-enzyme-mappings.dat) Location of sources: l $GPROOT/pathologic/$VERSION-NUMBER/data

Enzyme Name Matcher Matches on full enzyme name Match is case-insensitive and removes the punctuation characters “ -_(){}',:” Also matches after removal of prefixes and suffixes such as: l “Putative”, “Hypothetical”, etc l alpha|beta|…|catalytic|inducible chain|subunit|component l Parenthetical gene name

Enzyme Name Matcher For names that do not match, software identifies probable metabolic enzymes as those l Containing “ase” l Not containing keywords such as u “sensor kinase” u “topoisomerase” u “protein kinase” u “peptidase” u Etc Research unknown enzymes l MetaCyc, Swiss-Prot, PIR, Medline, EMP

Assigning Evidence Scores to Predicted Pathways X|Y|Z denotes score for P in O l where: u X = total number of reactions in P u Y = enzymes catalyzing number of reactions for which there is evidence in O u Z = number of Y reactions that are used in other pathways in O Not clear how to convert these scores into a probability of occurrence

Algorithm for Automated Pathway Pruning A pathway will never be pruned if it contains a unique enzyme – an enzyme not present in any other pathway A pathway will be pruned if one of the following conditions holds: l Evidence is better for a different pathway in same variant set l Evidence for only one reaction in pathway, or l Its set of reactions present is a proper subset of the reactions present in some other pathway, and u If pathway is a biosynthetic pathway, final reaction(s) missing u If pathway is a degradation pathway, initial reaction(s) missing u If pathway is an energy metabolism pathway, more than half the reactions are missing

Creating Protein Complexes

Complex Subunits Stoichiometries

Proteins as Reaction Substrates

Manual Pruning of Pathways Use pathway evidence report l Coloring scheme aids in assessing pathway evidence Phase I: Prune extra variant pathways Rescore pathways, re-generate pathway evidence report Phase II: Prune pathways unlikely to be present l No/few unique enzymes l Most pathway steps present because they are used in another pathway l Pathway very unlikely to be present in this organism

Overview Graph

Output from PPP Pathway/genome database Summary pages l Pathway evidence page u Click “Summary of Organisms”, then click organism name, then click “Pathway Evidence”, then click “Save Pathway Report” l Missing enzymes report Directory tree containing sequence files, reports, etc.

Resulting Directory Structure ROOT/aic-export/ecocyc/ORGIDcyc/VERSION/ l input u organism.dat u organism-init.dat u genetic-elements.dat u annotations files u sequence files l reports u name-matching-report.txt u trial-parse-report.txt l kb u ORGIDbase.ocelot l data u overview.graph l released -> VERSION

Caveats Cannot predict pathways not present in MetaCyc Evidence for short pathways is hard to interpret Since many reactions occur in lots of pathways, many false positives

The Pathway Tools Schema

Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB When writing Lisp complex queries to PGDBs, those queries must name classes and slots within the schema A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity

Reference Pathway Tools User’s Guide, Volume I l Appendix A: Guide to the Pathway Tools Schema

Web of Relationships for One Enzyme Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle

Frame Data Model and Schema Frame Data Model -- organizational principle for a DB Object Displays Schema l Gene slots l Polypeptide slots l Protein slots l Protein Complex slots l Reaction slots l Enzymatic Reaction slots

Frame Data Model Knowledge base (KB, Database, DB) Frames Slots Facets Annotations

Knowledge Base Collection of frames and their associated slots, values, facets, and annotations Can be stored within l An Oracle DB l A disk file l A Pathway Tools binary program

Frames Entities with which facts are associated Kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle Classes: l Superclass(es) l Subclass(es) l Instance(s) A symbolic frame name (id, key) uniquely identifies each frame

Slots Encode attributes/properties of a frame l Integer, real number, string Represent relationships between frames l The value of a slot is the identifier of another frame Every slot is described by a “slot frame” in a KB that defines meta information about that slot

Slot Links Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway

Slots Number of values l Single valued l Multivalued: sets, bags Slot values l Any LISP object: Integer, real, string, symbol (frame name), list Slotunits define properties of slots: datatypes, classes, constraints Two slots are inverses if they encode opposite relationships l Slot Product in class Genes l Slot Gene in class Polypeptides

Representation of Function Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle EC# K eq Cofactors Inhibitors Molecular wt pI Left-end-position

Monofunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway

Bifunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway Reaction Enzymatic-reaction

Monofunctional Multimer Monomer Gene Reaction Enzymatic-reaction Multimer Pathway

Pathway and Substrates Reactant-1 Reaction Pathway Reaction Reactant-2 Product-2 Product-1 in-pathway left right

Transcriptional Regulation site001 pro001 trpE trpD trpC trpB trpA trpL Int003RpoSig70 TrpR*trpInt001 trpLEDCBA trp apoTrpR Int005

Annotations Encode information about individual slot values Used to attach comments and citations to slot values Example: l Frame tryptophan-synthetase has a slot called Molecular- Weight with a value of 28 l Attached to that value is an annotation whose label is Citation and whose value is “[ ]”

Facets Encode information about slots Allow association between a slot and: l comments l citations Example: Comment attached to Inhibitors of EnzRxn Allow access to schema information

Principle Classes Class names are capitalized, plural Genetic-Elements, with subclasses: l Chromosomes l Plasmids Genes Transcription-Units RNAs Proteins, with subclasses: l Polypeptides l Protein Complexes

Principle Classes Reactions, with subclasses: l Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements

Slots in Multiple Classes Common-Name Synonyms Names (computed as union of Common-Name, Synonyms) Comment Citations DB-Links

Genes Slots Chromosome Left-End-Position Right-End-Position Centisome-Position Transcription-Direction Product

Proteins Slots Molecular-Weight-Seq Molecular-Weight-Exp pI Locations Modified-Form Unmodified-Form Component-Of

Polypeptides Slots Gene

Protein-Complexes Slots Components

Reactions Slots EC-Number Left, Right Substrates (computed as union of Left, Right) DeltaG0 Keq Spontaneous? Species

Enzymatic-Reactions Slots Enzyme Reaction Activators Inhibitors Physiologically-Relevant Cofactors Prosthetic-Groups Alternative-Substrates Alternative-Cofactors

Editing Pathway/Genome Databases

Pathway Tools Paradigm Separate database from user interface Navigator provides one view of the DB Editors provide an alternative view of the DB

Invoking the Editors Right-Click on an Object Handle l Edit l Notes l Show Shift-Middle-Click on an Object Handle

Saving Changes The user must save changes explicitly with Save KB To discard changes made since last save l Special -> KB -> Revert KB

Administering the Pathway Tools

Information Sources Pathway Tools User’s Guide l aic-export/ecocyc/genopath/released/doc/userguide1.pdf u Appendix A: Guide to the Pathway Tools Schema l aic-export/ecocyc/genopath/released/doc/userguide2.pdf Pathway Tools Web Site l Pathway Tools Tutorial l

Reporting Problems to Include: l Error message l Result of :zoom :count :all l What version and platform you are running l What operation were you performing when the error occurred?