PathoLogic Pathway Predictor. SRI International Bioinformatics Inference of Metabolic Pathways Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs.

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

The Pathway/Genome Navigator (These slides are a guide as you experiment with the Navigator)
Configuration management
SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
SRI International Bioinformatics Comparative Analysis Q
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
PantherSoft Financials Smart Internal Billing. Agenda  Benefits  Security and User Roles  Definitions  Workflow  Defining/Modifying Items  Creating.
Overview of the Pathway Tools Software and Pathway/Genome Databases.
Overview of the Pathway Tools Software and Pathway/Genome Databases.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Ogden Air Logistics Center. Purpose of Excel2FV Many agencies produce point lists of different data (target lists, force locations, etc.) in either Excel.
Chapter 7 Working with Files.
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
WESS Module 4 Chopping and Releasing HAZREPs Web Enabled Safety System.
File formats Wrapping your data in the right package Deanna M. Church
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Lead Management Tool Partner User Guide March 15, 2013
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
PathoLogic Pathway Predictor
Welcome to DNA Subway Classroom-friendly Bioinformatics.
 Whether using paper forms or forms on the web, forms are used for gathering information. User enter information into designated areas, or fields. Forms.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
SRI International Bioinformatics 1 Recent Pathway Tools Performance Enhancements (Versions 13.0 to 14.5) Bioinformatics Research Group SRI International.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
SRI International Bioinformatics 1 Genome Browser Tomer Altman Bioinformatics Research Group SRI, International August 19th, 2009.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Office of Housing Choice Voucher Program Voucher Management System – VMS Version Released October 2011.
Overview of the Pathway Tools Software and Pathway/Genome Databases Peter D. Karp Bioinformatics Research Group SRI International
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Editing Pathway/Genome Databases Compounds, Reactions and Pathways Ron Caspi.
SRI International Bioinformatics Update your computers! To install a patch: Tools => Instant Patch => Download and Activate All Patches.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Subscribers – List Model
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Welcome to the combined BLAST and Genome Browser Tutorial.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
QC – User Interface QUALITY CENTER. QC – Testing Process QC testing process includes four phases: Specifying Requirements Specifying Requirements Planning.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
PathoLogic Pathway Predictor
Editing Pathway/Genome Databases
Comparative Analysis in BioCyc
Single Sample Registration
The Pathway Tools Schema
PathoLogic: More about Matching Enzyme Names to Reactions
How to Administer a PGDB
Comparative Analysis Q
Overview of Microbial Pathway and Genome Databases
Incremental PathoLogic
Propagating Changed Annotation and Pathway Information
Annotation Presentation
Advanced PGDB Editing: Gene Ontology (GO) Terms
Presentation transcript:

PathoLogic Pathway Predictor

SRI International Bioinformatics Inference of Metabolic Pathways Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs Gene Products DNA Sequences Reactions Pathways Compounds Multi-organism Pathway Database (MetaCyc) PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Genomic Map Genes Gene Products Reactions Pathways Compounds

SRI International Bioinformatics PathoLogic Functionality Initialize schema for new PGDB Transform existing genome to PGDB form Infer metabolic pathways and store in PGDB Infer operons and store in PGDB Assemble Overview diagram Assist user with manual tasks l Assign enzymes to reactions they catalyze l Identify false-positive pathway predictions l Build protein complexes from monomers l Infer transport reactions l Fill pathway holes

SRI International Bioinformatics PathoLogic Analysis Phases Trial parsing of input data files -- fix errors Initialize schema of new PGDB (automatic) Create DB objects for replicons, genes, proteins (automatic) Assign enzymes to reactions they catalyze (part automatic, part manual) From assigned reactions, infer what pathways are present (automatic, with manual review)

SRI International Bioinformatics PathoLogic Analysis Phases Define metabolic overview diagram (automatic, redo after changing data) Define protein complexes (manual) Define transcription units (automatic) Infer transport reactions (manual review necessary) Fill Pathway Holes (manual review necessary)

SRI International Bioinformatics PathoLogic Input/Output Inputs: l List of all genetic elements u Enter using GUI or provide a file l Files containing annotation for each genetic element l Files containing DNA sequence for each genetic element l MetaCyc database Output: l Pathway/genome database for the subject organism l Reports that summarize: u Evidence in the input genome for the presence of reference pathways u Reactions missing from inferred pathways

SRI International Bioinformatics File Naming Conventions One pair of sequence and annotation files for each genetic element Sequence files: FASTA format l suffix fsa or fna Annotation file: l Genbank format: suffix.gbk l PathoLogic format: suffix.pf

SRI International Bioinformatics Typical Problems Using Genbank Files With PathoLogic Wrong qualifier names used: read PathoLogic documentation! Extraneous information in a given qualifier Check results of trial parse carefully

SRI International Bioinformatics GenBank File Format Accepted feature types: l CDS, tRNA, rRNA, misc_RNA Accepted qualifiers: l /locus_tagUnique ID [recm] l /geneGene name [req] l /product [req] l /EC_number [recm] l /product_comment [opt] l /gene_comment [opt] l /alt_nameSynonyms [opt] l /pseudoGene is a pseudogene [opt] l /db_xref DB:AccessionID [opt] l /go_component, /go_function, /go_process GO terms [opt] For multifunctional proteins, put each function in a separate /product line

SRI International Bioinformatics PathoLogic File Format Each record starts with line containing an ID attribute Tab delimited Each record ends with a line containing // One attribute-value pair is allowed per line l Use multiple FUNCTION lines for multifunctional proteins Lines starting with ‘;’ are comment lines Valid attributes are: l ID, NAME, SYNONYM l STARTBASE, ENDBASE, GENE-COMMENT l FUNCTION, PRODUCT-TYPE, EC, FUNCTION-COMMENT l DBLINK l GO l INTRON

SRI International Bioinformatics PathoLogic File Format IDTP0734 NAMEdeoD STARTBASE ENDBASE FUNCTIONpurine nucleoside phosphorylase DBLINK PID:g PRODUCT-TYPE P GENE-COMMENTsimilar to GP: percent identity: 57.51; identified by sequence similarity; putative // IDTP0735 NAMEgltA STARTBASE ENDBASE FUNCTIONglutamate synthase DBLINK PID:g PRODUCT-TYPE P GOglutamate synthase (NADPH) activity [goid ] [evidence IDA] [pmid ]

SRI International Bioinformatics Before you start: What to do when an error occurs Most Navigator errors are automatically trapped – debugging information is saved to error.tmp file. All other errors (including most PathoLogic errors) will cause software to drop into the Lisp debugger l Unix: error message will show up in the original terminal window from which you started Pathway Tools. l Windows: Error message will show up in the Lisp console. The Lisp console usually starts out iconified – its icon is a blue bust of Franz Liszt 2 goals when an error occurs: l Try to continue working l Obtain enough information for a bug report to send to pathway-tools support team.

SRI International Bioinformatics The Lisp Debugger Sample error (details and number of restart actions differ for each case) Error: Received signal number 2 (Keyboard interrupt) Restart actions (select using :continue): 0: continue computation 1: Return to command level 2: Pathway Tools version 10.0 top level 3: Exit Pathway Tools version 10.0 [1c] EC(2): To generate debugging information (stack backtrace): :zoom :count :all To continue from error, find a restart that takes you to the top level – in this case, number 2 :cont 2 To exit Pathway Tools: :exit

SRI International Bioinformatics How to report an error Determine if problem is reproducible, and how to reproduce it (make sure you have all the latest patches installed) Send to l Pathway Tools version number and platform l Description of exactly what you were doing (which command you invoked, what you typed, etc.) or instructions for how to reproduce the problem l error.tmp file, if one was generated If software breaks into the lisp debugger, the complete error message and stack backtrace (obtained using the command :zoom :count :all, as described on previous slide)

SRI International Bioinformatics PathoLogic Command Menus Invoking PathoLogic: Tools -> PathoLogic Organism l Select l Create New l Save KB l Revert KB l Reinitialize KB l Convert File KB to Oracle KB l Convert File KB to MySQL KB l Backup KB to File l New Version l Specify Reference PGDB(s) l Exit Build l Trial Parse l Automated Build l Update Build for Revised Annotation Refine l Assign Probable Enzymes l Assign Modified Proteins l Create Protein Complexes l Re-run Name Matcher l Rescore Pathways l Predict transcription units l Transport Identification Parser l Update Overview l Pathway Hole Filler

SRI International Bioinformatics Using the PPP GUI to Create a Pathway/Genome Database Input Project Information l Organism -> Create New l Creates directory structure for new PGDB l Creates and saves empty PGDB, populated only with objects common to all PGDBs (schema classes, elements, etc.) and data you entered in the form. l Offers to invoke Replicon Editor

SRI International Bioinformatics Input Project Information

SRI International Bioinformatics Enter Replicon Information For each replicon l Name l Type: chromosome, plasmid, etc. l Circular? l Annotation file l Sequence file (optional) l Contigs (optional) l Links to other DBs (optional) GUI-Based entry l Build->Specify Replicons File-Based Entry l Create genetic-elements.dat file using template provided

SRI International Bioinformatics GUI-Based Replicon Entry

SRI International Bioinformatics Batch Entry of Replicon Info File / cyc/ /input/genetic-elements.dat: ID TEST-CHROM-1 NAME Chromosome 1 TYPE :CHRSM CIRCULAR? N ANNOT-FILE chrom1.pf SEQ-FILE chrom1.fsa // ID TEST-CHROM-2 NAME Chromosome 2 CIRCULAR? N ANNOT-FILE /mydata/chrom2.gbk SEQ-FILE /mydata/chrom2.fna //

SRI International Bioinformatics Specify Reference PGDB(s) This step is optional, and most users will omit it MetaCyc is always the primary reference PGDB Specify additional reference PGDB if you have your own curated PGDB which has: l Pathways and/or reactions that are not in MetaCyc l Manual functional assignments, with names similar to current genome There is no point specifying any of our PGDBs as references, only your own curated PGDBs.

SRI International Bioinformatics Building the PGDB Trial Parse l Build -> Trial Parse l Check output to ensure numbers “look right” u Same number of gene start positions, end positions, names u Did my file contain EC numbers? Were they detected? u Did my file contain RNAs? Were they detected? l Fix any errors in input files Build pathway/genome database l Build -> Automated Build

SRI International Bioinformatics PathoLogic Parser Output

SRI International Bioinformatics Automated Build Parses input files Creates objects for every gene and gene product Uses EC numbers, GO annotations and name matcher to match enzymes to reactions in MetaCyc Imports catalyzed enzymes and compounds from MetaCyc Generates list of likely enzymes that couldn’t be assigned Infers pathways likely to be present Generates Cellular Overview Diagram (first pass) Generates reports

SRI International Bioinformatics Assign Enzymes to ReactionsMatch Gene product UDP-glucose-4- epimerase yes Assign no Probable enzyme -ase noyes Not a metabolic enzyme Manually search yes Assign no Can’t Assign MetaCyc UDP-D-glucose  UDP-galactose

SRI International Bioinformatics Matching Enzymes to Reactions Matches on full EC number (partial ECs ignored) Matches on Molecular Function GO terms l If definition of GO term includes cross-reference either to an EC number or to a MetaCyc reaction. Matches on full enzyme name l Match is case-insensitive and removes the punctuation characters “ -_(){}',:” l Also matches after removal of prefixes and suffixes such as: u “Putative”, “Hypothetical”, etc u alpha|beta|…|catalytic|inducible chain|subunit|component u Parenthetical gene name

SRI International Bioinformatics Enzyme Name Matcher For names that do not match, software identifies probable metabolic enzymes as those l Containing “ase” l Not containing keywords such as u “sensor kinase” u “topoisomerase” u “protein kinase” u “peptidase” u Etc User should research unknown enzymes l MetaCyc, Swiss-Prot, PubMed

SRI International Bioinformatics Stored in ORGIDcyc/VERSION/reports/name-matching-report.txt

SRI International Bioinformatics Automated Pathway Inference All pathways in MetaCyc for which there is at least one enzyme identified in the target organism are considered for possible inclusion. Algorithm errs on side of inclusivity – easier to manually delete a pathway from an organism than to find a pathway that should have been predicted but wasn’t.

SRI International Bioinformatics Considerations taken into account when deciding whether or not a pathway should be inferred: Is there a unique enzyme – an enzyme not involved in any other pathway? Does the organism fall in the expected taxonomic domain of the pathway? Is this pathway part of a variant set, and, if so, is there more evidence for some other variant? If there is no unique enzyme: l Is there evidence for more than one enzyme? l If a biosynthetic pathway, is there evidence for final reaction(s)? l If a degradation pathway, is there evidence for initial reaction(s)? l If an energy metabolism pathway, is there evidence for more than half the reactions?

SRI International Bioinformatics Assigning Evidence Scores to Predicted Pathways X|Y|Z denotes score for P in O l where: u X = total number of reactions in P u Y = enzymes catalyzing number of reactions for which there is evidence in O u Z = number of Y reactions that are used in other pathways in O

SRI International Bioinformatics Pathway Evidence Report On Organism Summary Page in Navigator, button “Generate Pathway Evidence Report” Report saved as HTML file, view in browser Hierarchical listing of all inferred pathways l “Pathway Glyph” shows evidence graphically u Steps with/without enzymes (green/black) u Steps that are unique to pathway (orange) u Steps filled by Pathway Hole Filler (blue) l Counts reactions in pathway, with evidence, in other pathways l Lists other pathways that share reactions l Link to pathway in MetaCyc

SRI International Bioinformatics

SRI International Bioinformatics Manual Pruning of Pathways Use pathway evidence report l Coloring scheme aids in assessing pathway evidence Phase I: Prune extra variant pathways Rescore pathways, re-generate pathway evidence report Phase II: Prune pathways unlikely to be present l No/few unique enzymes l Most pathway steps present because they are used in another pathway l Pathway very unlikely to be present in this organism l Nonspecific enzyme name assigned to a pathway step

SRI International Bioinformatics Caveats Cannot predict pathways not present in MetaCyc Evidence for short pathways is hard to interpret Since many reactions occur in multiple pathways, some false positives Next generation pathway inference algorithm is work currently in progress!

SRI International Bioinformatics Output from PPP Pathway/genome database Summary pages l Pathway evidence page u Click “Summary of Organisms”, then click organism name, then click “Pathway Evidence”, then click “Save Pathway Report” l Missing enzymes report Directory tree containing sequence files, reports, etc.

SRI International Bioinformatics Resulting Directory Structure ROOT/ptools-local/pgdbs/user/ORGIDcyc/VERSION/ l input u organism.dat u organism-init.dat u genetic-elements.dat u annotation files u sequence files l reports u name-matching-report.txt u trial-parse-report.txt l kb u ORGIDbase.ocelot l data u overview.graph l released -> VERSION

SRI International Bioinformatics Manual Polishing Refine -> Assign Probable Enzymes  Do this first Refine -> Rescore Pathways  Redo after assigning enzymes Refine -> Create Protein Complexes  Can be done at any time Refine -> Assign Modified Proteins  Can be done at any time Refine -> Transport Identification Parser  Can be done at any time Refine -> Pathway Hole Filler Refine -> Predict Transcription Units Refine -> Update Overview  Do this last, and repeat after any material changes to PGDB

SRI International Bioinformatics Assign Probable Enzymes

SRI International Bioinformatics How to find reactions for probable enzymes First, verify that enzyme name describes a specific, metabolic function Search for fragment of name in MetaCyc – you may be able to find a match that PathoLogic missed Look up protein in UniProt or other DBs Search for gene name in PGDB for related organism (bear in mind that gene names are not reliable indicators of function, so check carefully) Search for function name in PubMed Other…