Download presentation
Presentation is loading. Please wait.
Published byTobias Short Modified over 9 years ago
1
SRI International Bioinformatics 1 Computing with Pathway/Genome Databases
2
SRI International Bioinformatics 2 Overview Summary of Pathway Tools data access mechanisms and formats Pathway Tools APIs Overview of Pathway Tools schema
3
SRI International Bioinformatics 3 Writing Complex PGDB Queries When writing complex queries to PGDBs, those queries must refer to classes and slots within the schema l Queries using Lisp, Perl, Java APIs l Queries using Structured Advanced Query Form l Queries using BioVelo
4
SRI International Bioinformatics 4 Pathway Tools Implementation Details Platforms: l Macintosh, PC/Linux, and PC/Windows platforms Same binary can run as desktop app or Web server Production-quality software l Version control l Two regular releases per year l Extensive quality assurance l Extensive documentation l Auto-patch l Automatic DB-upgrade 420,000 lines of Lisp code
5
SRI International Bioinformatics 5 More Information Pathway Tools Web Site, Tutorial Slides l http://bioinformatics.ai.sri.com/ptools/ l http://bioinformatics.ai.sri.com/ptools/examples.lisp PerlCyc & JavaCyc API, includes some relationships l http://www.arabidopsis.org/tools/aracyc/perlcyc/ http://www.arabidopsis.org/tools/aracyc/perlcyc/ l http://www.arabidopsis.org/tools/aracyc/javacyc/ Pathway Tools User’s Guide l Appendix: Guide to the Pathway Tools Schema Curator's Guide l http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf aic/pathway-tools/nav/12.0/lisp/relationships.lisp
6
SRI International Bioinformatics 6 References Ontology Papers section of http://biocyc.org/publications.shtml http://biocyc.org/publications.shtml l "An Evidence Ontology for use in Pathway/Genome Databases" l "An ontology for biological function based on molecular interactions" l "Representations of metabolic knowledge: Pathways" l "Representations of metabolic knowledge"
7
SRI International Bioinformatics 7 Data Exchange APIs: Lisp API, Java API, and Perl API l Read and modify access Cyclone Export to files l BioPAX Export Biopax.org l Export PGDB genome to Genbank format l Export entire PGDB as column-delimited and attribute-value file formats l Export PGDB reactions as SBML -- sbml.org l Import/Export of Pathways: between PGDBs l Import/Export of Selected Frames, for Spreadsheets l Import/Export of Compounds as Molfile, CML BioWarehouse : Loader for Flatfiles, SQL access l http://bioinformatics.ai.sri.com/biowarehouse/ http://bioinformatics.ai.sri.com/biowarehouse/ l BMC Bioinformatics 7:170 2006
8
SRI International Bioinformatics 8 Programmatic Access to BioCyc Common LISP Native language of Pathway Tools Interactive & Mature Environment Full Access to the Data & Many Utility Functions Source code is available for academics PerlCyc API of Functions, Exposed to Perl Communication through UNIX Socket JavaCyc API of Functions, Exposed to Java Communication through UNIX Socket Cyclone
9
SRI International Bioinformatics 9 Cyclone Developed by Schachter and colleagues from Genoscope http://nemo-cyclone.sourceforge.net/archi.php Cyclone is a Java-based system that: l Extracts data from a Pathway Tools PGDB l Converts it to an XML schema l Maps the data to Java objects and to a relational database l Changes made to the data on the Java side can be committed back to a Pathway Tools PGDB
10
SRI International Bioinformatics 10 Lisp API Accessible whenever you start Pathway Tools with the –lisp argument Lisp queries evaluate against the running Pathway Tools binary and execute very fast
11
SRI International Bioinformatics 11 Generic Frame Protocol (GFP) A library of procedures for accessing Ocelot DBs GFP specification: l http://www.ai.sri.com/~gfp/spec/paper/paper.html A small number of GFP functions are sufficient for most complex queries
12
SRI International Bioinformatics 12 Example of a Single GFP Call The General Pattern: gfp-function(frame-ID slot-ID value...) (gfp-function frame-ID slot-ID value …) LISP (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER)
13
SRI International Bioinformatics 13 Generic Frame Protocol get-class-all-instances (Class) l Returns the instances of Class coercible-to-frame-p (Thing) l Is Thing a frame? Returns True if Thing is the name of a frame, or a frame object; else False
14
SRI International Bioinformatics 14 Generic Frame Protocol Notation Frame.Slot means a specified slot of a specified frame get-slot-value(Frame Slot) l Returns first value of Frame.Slot get-slot-values(Frame Slot) l Returns all values of Frame.Slot as a list slot-has-value-p(Frame Slot) l Returns True if Frame.Slot has at least one value; else False member-slot-value-p(Frame Slot Value) l Returns True if Value is one of the values of Frame.Slot; else False print-frame(Frame) l Prints the contents of Frame Note: Frame and Slot must be symbols!
15
SRI International Bioinformatics 15 Generic Frame Protocol – Update Operations put-slot-value(Frame Slot Value) l Replace the current value(s) of Frame.Slot with Value put-slot-values(Frame Slot Value-List) l Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values add-slot-value(Frame Slot Value) l Add Value to the current value(s) of Frame.Slot, if any remove-slot-value(Frame Slot Value) l Remove Value from the current value(s) of Frame.slot replace-slot-value(Frame Slot Old-Value New-Value) l In Frame.Slot, replace Old-Value with New-Value remove-local-slot-values(Frame Slot) l Remove all of the values of Frame.Slot
16
SRI International Bioinformatics 16 Generic Frame Protocol – Update Operations save-kb l Saves the current KB
17
SRI International Bioinformatics 17 Additional Pathway Tools Functions – Semantic Inference Layer Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB http://bioinformatics.ai.sri.com/ptools/ptools- fns.html http://bioinformatics.ai.sri.com/ptools/ptools- fns.html
18
SRI International Bioinformatics 18 PerlCyc and JavaCyc Work on Unix (Solaris or Linux) only Start up Pathway Tools with the –api arg Pathway Tools listens on a Unix socket – perl program communicates through this socket Supports both querying and editing PGDBs Must run perl or java program on the same machine that runs Pathway Tools l This is a security measure, as the API server has no built-in security Can only handle one connection at a time
19
SRI International Bioinformatics 19 Obtaining PerlCyc and JavaCyc Download from http://www.sgn.cornell.edu/downloads/ PerlCyc written and maintained by Lukas Mueller at Boyce Thompson Institute for Plant Research. JavaCyc written by Thomas Yan at Carnegie Institute, maintained by Lukas Mueller. Easy to extend…
20
SRI International Bioinformatics 20 Examples of PerlCyc, JavaCyc Functions GFP functions (require knowledge of Pathway Tools schema): l get_slot_values l get_class_all_instances l put_slot_values Pathway Tools functions (described at http://bioinformatics.ai.sri.com/ptools/ptools-fns.html): http://bioinformatics.ai.sri.com/ptools/ptools-fns.html l genes_of_reaction l find_indexed_frame l pathways_of_gene l transport_p l getSlotValues l getClassAllInstances l putSlotValues l genesOfReaction l findIndexedFrame l pathwaysOfGene l transportP
21
SRI International Bioinformatics 21 Writing a PerlCyc or JavaCyc program Create a PerlCyc, JavaCyc object: perlcyc -> new (“ORGID”) new Javacyc (“ORGID”) Call PerlCyc, JavaCyc functions on this object: my $cyc = perlcyc -> new (“ECOLI”); my @pathways = $cyc -> all_pathways (); Javacyc cyc = new Javacyc(“ECOLI”); ArrayList pathways = cyc.allPathways (); Functions return object IDs, not objects. l Must connect to server again to retrieve attributes of an object. foreach my $p (@pathways) { print $cyc -> get_slot_value ($p, “COMMON-NAME”);} for (int i=0; I < pathways.size(); i++) { String pwy = (String) pathways.get(i); System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); }
22
SRI International Bioinformatics 22 Sample PerlCyc Query Number of proteins in E. coli use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); my @proteins = $cyc-> get_class_all_instances("|Proteins|"); my $protein_count = scalar(@proteins); print "Protein count: $protein_count.\n";
23
SRI International Bioinformatics 23 Sample PerlCyc Query Print IDs of all proteins with molecular weight between 10 and 20 kD and pI between 4 and 5. use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { my $mw = $cyc->get_slot_value($p, "molecular-weight-kd"); my $pI = $cyc->get_slot_value($p, "pi"); if ($mw = 10 && $pI = 4) { print "$p\n"; }
24
SRI International Bioinformatics 24 Sample PerlCyc Query List all the transcription factors in E. coli, and the list of genes that each regulates: use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { if ($cyc->transcription_factor_p($p)) { my $name = $cyc->get_slot_value($p, "common-name"); my %genes = (); foreach my $tu ($cyc->regulon_of_protein($p)) { foreach my $g ($cyc->transcription_unit_genes($tu)) { $genes{$g} = $cyc->get_slot_value($g, "common-name"); } print "\n\n$name: "; print join " ", values %genes; }
25
SRI International Bioinformatics 25 Sample Editing Using PerlCyc Add a link from each gene to the corresponding object in MY-DB (assume ID is same in both cases) use perlcyc; my $cyc = perlcyc -> new (“HPY”); my @genes = $cyc->get_class_all_instances (“|Genes|”); foreach my $g (@genes) { $cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”); } $cyc->save_kb();
26
SRI International Bioinformatics 26 Sample JavaCyc Query: Enzymes for which ATP is a regulator import java.util.*; public class JavacycSample { public static void main(String[] args) { Javacyc cyc = new Javacyc("ECOLI"); ArrayList regframes = cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|"); for (int i = 0; i < regframes.size(); i++) { String reg = (String)regframes.get(i); boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP"); if (bool) { String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”); String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”); System.out.println(enz); } } } }
27
SRI International Bioinformatics 27 Simple Lisp Query Example: Enzymes for which ATP is a regulator (defun atp-inhibits () (loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|) ;; Does the Regulator slot contain the compound ATP, and the mode ;; of regulation is negative (inhibition)? when (and (member-slot-value-p x ‘Regulator 'ATP) (member-slot-value-p x ‘Mode “-”) ) ;; Whenever the test is positive, we collect the value of the slot Enzyme ;; of the Regulated-Entity of the regulatory interaction frame. ;; The collected values are returned as a list, once the loop terminates. collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) ) ) ;;; invoking the query: (select-organism :org-id 'ECOLI) (atp-inhibits) (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER)
28
SRI International Bioinformatics 28 Simple Perl Query Example: Enzymes for which ATP is a regulator use perlcyc; my $cyc = perlcyc -> new("ECOLI"); my @regs = $cyc -> get_class_all_instances("|Regulation-of-Enzyme- Activity|"); ## We check every instance of the class foreach my $reg (@regs) { ## We test for whether the INHIBITORS-ALL ## slot contains the compound frame ATP my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp"); my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-"); if ($bool1 && $bool2) { ## Whenever the test is positive, we collect the value of the slot ENZYME. ## The results are printed in the terminal. my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity"); my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme"); print STDOUT "$enz\n"; }
29
SRI International Bioinformatics 29 Getting started with Lisp pathway-tools –lisp (load “file”) (compile-file “file.lisp”) Emacs is a useful editor Pathway Tools source code is available: ask Lisp resources: http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
30
SRI International Bioinformatics 30 Viewing Results via the Answer List (replace-answer-list (query))
31
SRI International Bioinformatics 31 Query Gotchas Study schema carefully :test #’fequal Cascade of slot-values: check for NIL
32
SRI International Bioinformatics 32 Semantic Inference Layer relationships.lisp Library of functions that encapsulate common query building blocks and intricacies of navigating the schema enzymes-of-gene reactions-of-gene pathways-of-gene genes-of-pathway pathway-hole-p reactions-of-compound top-containers(protein) all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport etc.) l (all-rxns :metab-pathways)
33
SRI International Bioinformatics 33 Pathway Tools Schema and Semantic Inference Layer
34
SRI International Bioinformatics 34 Pathway Tools Ontology / Schema Ontology classes: 1621 l Datatype classes: Define objects from genomes to pathways l Classification systems / controlled vocabularies u Pathways, chemical compounds, enzymatic reactions (EC system) u Protein Feature ontology u Cell Component Ontology u Evidence Ontology Comprehensive set of 279 attributes and relationships
35
SRI International Bioinformatics 35 Polynucleotides
36
SRI International Bioinformatics 36 Use GKB Editor to Inspect the Pathway Tools Ontology GKB Editor = Generic Knowledge Base Editor Type in Navigator window: (GKB) or [Right-Click] Edit->Ontology Editor View->Browse Class Hierarchy [Middle-Click] to expand hierarchy To view classes or instances, select them and: l Frame -> List Frame Contents l Frame -> Edit Frame
37
SRI International Bioinformatics 37 Use the SAQP to Inspect the Schema
38
SRI International Bioinformatics 38 Pathway Tools Schema Appendix of Pathway Tools User’s Guide Schema overview diagram
39
SRI International Bioinformatics 39 Root Classes in the Pathway Tools Ontology Chemicals-- All molecules Polymer-Segments-- Regions of polymers Protein-Features-- Features on proteins Paralogous-Gene-Groups Organisms Generalized-Reactions-- Reactions and pathways Enzymatic-Reactions-- Link enzymes to reactions they catalyze Regulation-- Regulatory interactions CCO-- Cell Component Ontology Evidence -- Evidence ontology Notes-- Timestamped, person-stamped notes Organizations People Publications
40
SRI International Bioinformatics 40 Principal Classes Class names are usually capitalized, plural, separated by dashes Genetic-Elements, with subclasses: l Chromosomes l Plasmids Genes Transcription-Units RNAs l rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses: l Polypeptides l Protein-Complexes
41
SRI International Bioinformatics 41 Principal Classes Reactions Enzymatic-Reactions Pathways Compounds-And-Elements Regulation
42
SRI International Bioinformatics 42 Semantic Network Diagrams Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhA sdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway
43
SRI International Bioinformatics 43 Pathway Tools Schema and Semantic Inference Layer Genes, Operons, and Replicons
44
SRI International Bioinformatics 44 Representing a Genome Classes: l ORG is of class Organisms l CHROM1 is of class Chromosomes l PLASMID1 is of class Plasmids l Gene1 is of class Genes l Product1 is of class Polypeptides or RNA ORG CHROM1 CHROM2 PLASMID1 Gene1 Gene2 Gene3 genome components Product1 product
45
SRI International Bioinformatics 45 (defun genes-of-chrom (chrom) (loop for x in (get-slot-values chrom ‘components) when (instance-all-instance-of-p x ‘|Genes|) collect x) )
46
SRI International Bioinformatics 46 Polynucleotides Review slots of COLI and of COLI-K12
47
SRI International Bioinformatics 47 Genetic-Elements Sequence is stored in l File PGDB: A separate file l Relational DBMS PGDB: A relational database table
48
SRI International Bioinformatics 48 Polymer-Segments Review slots of Genes
49
SRI International Bioinformatics 49 Complexities of Gene / Gene-Product Relationships The Product of a gene can be an instance of Polypeptides or RNAs An instance of Polypeptides can have more than one gene encoding it Sequence position: l Nucleotide positions of starting and ending codons specified in Left-End- Position and Right-End-Position (usually greater, except at origin) l Transcription-Direction + / - Alternative splicing: l Nucleotide positions of starting and ending codons specified in Left-End- Position and Right-End-Position l Intron positions specified in Splice-Form-Introns of gene product u (200 300) (350 400)
50
SRI International Bioinformatics 50 Gene Reaction Schematic
51
SRI International Bioinformatics 51 Proteins
52
SRI International Bioinformatics 52 Proteins and Protein Complexes Polypeptide: the monomer protein product of a gene (may have multiple isoforms, as indicated at gene level) Protein complex: proteins consisting of multiple polypeptides or protein complexes Example: DNA pol III l DnaE is a polypeptide l pol III core enzyme contains DnaE, DnaQ, HolE l pol III holoenzyme contains pol III core enzyme plus three other complexes
53
SRI International Bioinformatics 53 Protein Complex Relationships
54
SRI International Bioinformatics 54 Slots of a protein (DnaE) catalyzes Is it an activator/reactant/etc? comments component-of dblinks features (edited in feature editor) Many other features possible
55
SRI International Bioinformatics 55 A complex at the frame level (pol III) Same features as polypeptide frame, different use comment component-of and components l note coefficients
56
SRI International Bioinformatics 56 Protein Complex Relationships
57
SRI International Bioinformatics 57 Relationships are Defined in Many Places component-of comes from creating a complex appears-in-left-side-of comes from defining a reaction (as do modified forms) inhibitor-of comes from an enzymatic reaction can only edit dna-footprint if protein has been associated with a TU
58
SRI International Bioinformatics 58 Semantic Inference Layer Reactions-of-protein (prot) l Returns a list of rxns this protein catalyzes Transcription-units-of-proteins(prot) l Returns a list of TU’s activated/inhibited by the given protein Transporter? (prot) l Is this protein a transporter? Polypeptide-or-homomultimer?(prot) Transcription-factor? (prot) Obtain-protein-stats l Returns 5 values u Length of : all-polypeptides, complexes, transporters, enzymes, etc…
59
SRI International Bioinformatics 59 Example Find all enzymes that use pyridoxal phosphate as a cofactor or prosthetic group l (loop for protein in (get-class-all-instances ‘|Proteins|) for enzrxn = (get-slot-value protein ‘enzymatic-reaction) when (and enzrxn (or (member-slot-value-p enzrxn ‘cofactors ‘pyridoxal_phosphate) (member-slot-value-p enzrxn ‘prosthetic-groups ‘pyridoxal_phosphate)) collect protein) (member-slot-value-p frame slot value) : T if Value is one of the values of Slot of Frame.
60
SRI International Bioinformatics 60 Sample Find all proteins without a comment anywhere
61
SRI International Bioinformatics 61 Compounds / Reactions / Pathways
62
SRI International Bioinformatics 62 Compounds / Reactions / Pathways Think of a three tiered structure: l Reactions built on top of compounds l Pathways built on top of reactions Metabolic network defined by reactions alone; pathways are an additional “optional” structure Some reactions not part of a pathway Some reactions have no attached enzyme Some enzymes have no attached gene
63
SRI International Bioinformatics 63 Compounds
64
SRI International Bioinformatics 64
65
SRI International Bioinformatics 65 Compounds Relatively few aspects of a compound defined within the compound editor l MW, formula calculated from edited structure Most aspects defined in other editors l “Pathway reactions” comes from reaction editing followed by pathway editing l Activator, etc come from the enzymatic reaction editor
66
SRI International Bioinformatics 66 -- Instance TRP --- Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids| APPEARS-IN-LEFT-SIDE-OF: RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN, TRYPTOPHAN--TRNA-LIGASE-RXN APPEARS-IN-RIGHT-SIDE-OF: RXN0-2382, RXN0-301, TRANS-RXN-76, TRYPSYN-RXN CHEMICAL-FORMULA: (C 11), (H 12), (N 2), (O 2) COMMON-NAME: "L-tryptophan" DBLINKS: (LIGAND-CPD "C00078" NIL |kaipa| 3311532640 NIL NIL), (CAS "6912-86-3"), (CAS "73-22-3") NAMES: "L-tryptophan", "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid" SMILES: "c1(c(CC(N)C(=O)O)c2(c([nH]1)cccc2))" SYNONYMS: "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid" ____________________________________________
67
SRI International Bioinformatics 67 Where is diphosphate in the ontology?
68
SRI International Bioinformatics 68 Semantic Inference Layer Reactions-of-compound (cpd) Pathways-of-compound (cpd) Is-substrate-an-autocatalytic-enzyme-p (cpd) Activated/inhibited-by? (cpds slots) l Returns a list of enzrxns for which a cpd in cpds is a modulator (example slots: activators-all, activators-allosteric) All-substrates (rxns) l All unique substrates specified in the given rxns Has-structure-p (cpd) Obtain-cpd-stats l Returns two values: u Length of :all-cpds, cpds with structures
69
SRI International Bioinformatics 69 Miscellaneous things…. History List l Back/Forward and History buttons l Default list is 50 items Show frame (print-frame ‘frame)
70
SRI International Bioinformatics 70
71
SRI International Bioinformatics 71 Queries with Multiple Answers Navigator queries: l Example: Substring search for “pyruvate” l Selected list is placed on the Answer list l Use “Next Answer” button to view each one of them Lisp queries: Example : Find reactions involving pyruvate as a substrate u (get-class-all-instances ‘|Compounds|) ( loop for rxn in (get-class-all-instances ‘|Reactions|) when (member ‘pyruvate (get-slot-values rxn ‘substrates) collect rxn) (replace-answer-list * )
72
SRI International Bioinformatics 72 Reactions
73
SRI International Bioinformatics 73 Enzymatic Reactions (DnaE and 2.7.7.7) A necessary bridge between enzymes and “generic” versions of reactions Carries information specific to an enzyme/reaction combination: l Cofactors and prosthetic groups l Alternative substrates l Links to regulatory interactions Frame is generated when protein is associated with reaction (via protein or reaction editor)
74
SRI International Bioinformatics 74
75
SRI International Bioinformatics 75 Regulation of Enzyme Activity
76
SRI International Bioinformatics 76 Reactions Represents information about a reaction that is independent of enzymes that catalyze the reaction Connected to enzyme(s) via enzymatic reaction frames Classified with EC system when possible Example: 2.7.7.7 – DNA-directed DNA polymerization l Carried out by five enzymes in E. coli
77
SRI International Bioinformatics 77 Reaction Ontology
78
SRI International Bioinformatics 78 Where is 2.7.7.7 in the Ontology?
79
SRI International Bioinformatics 79 Slots of Reaction Frames Balance-state EC-number Enzymatic-reaction l Generated in protein or reaction editor In-pathway l Generated in pathway editor Left and Right (reactants / products) l Can include modified forms of proteins, RNAs, etc here l Not all reactants/products need to be frames
80
SRI International Bioinformatics 80
81
SRI International Bioinformatics 81 Reaction relationships
82
SRI International Bioinformatics 82 Semantic Inference Layer Genes-of-reaction (rxn) Substrates-of-reaction (rxn) Enzymes-of-reaction (rxn) Lacking-ec-number (organism) l Returns list of rxns with no ec numbers in that database Get-reaction-direction-in-pathway (pwy rxn) Reaction-type(rxn) u Indicates types of Rxn as: Small molecule rxn, transport rxn, protein-small-molecule rxn (one substrate is protein and one is a small molecule), protein rxn (all substrates are proteins), etc. All-rxns(type) l Specify the type of reaction (see above for type) Obtain-rxn-stats l Returns six values u Length of : all-rxns, transport, non-transport, etc…
83
SRI International Bioinformatics 83 Find all small-molecule reactions that have no enzyme but are not spontaneous (“orphan” reactions) (defun orphan-reactions (&optional (verbose? t)) (loop for r in (all-rxns :small-molecule) when (and (not (slot-has-value-p r 'enzymatic-reaction)) (not (get-slot-value r 'spontaneous?))) collect r) )
84
SRI International Bioinformatics 84 Reaction Direction Left/Right reflect direction of reaction as written by Enzyme Commission l Reflects systematic direction for different reaction classes Left/Right do not necessarily correspond to physiological direction of a reaction Get-rxn-direction(rxn) l Returns :L2R or :R2L or :BOTH or NIL l Integrates all available info about direction of this reaction u Direction(s) it occurs in all pathways in the PGDB u Direction(s) as specified in Enzymatic-Reactions
85
SRI International Bioinformatics 85 RNAs
86
SRI International Bioinformatics 86 RNAs PGDBs only represent RNAs that are “terminal gene products” l tRNAs l rRNAs l Regulatory RNAs l Miscellaneous small RNAs Slots similar to proteins tRNAs can have an anticodon
87
SRI International Bioinformatics 87
88
SRI International Bioinformatics 88 The RNA Ontology
89
SRI International Bioinformatics 89 Pathway Tools Schema and Semantic Inference Layer: Pathways and the Overview
90
SRI International Bioinformatics 90 Outline Pathways l Representation of Pathways l Querying Pathways Programmatically l How Pathway Diagrams are Generated l Future Work: Signalling Pathways Cellular Overview Diagram l New Functionality l Under the Hood l How Overview Diagram is Generated l Using Overview Diagram for Global Queries
91
SRI International Bioinformatics 91 What is a Pathway? An ordered set of interconnected, directed biochemical reactions Reactions form a coherent unit, e.g. l Regulated as a single unit l Evolutionarily conserved across organisms as a single unit l When combined, perform a single cellular function l Historically grouped together as a unit Includes metabolic pathways and signalling pathways Evidence for all reactions in a single organism Pathways can be linear, cyclical, branched, or some combination
92
SRI International Bioinformatics 92 Internal Representation of Pathways REACTION-LIST: unordered list of reactions that comprise the pathway PREDECESSORS: list of reaction pairs that define ordering relationships between reactions. R1 R2 C A B R3 D (R2 R1) : Predecessor of R2 is R1 (R3 R1) : Predecessor of R3 is R1 (R1) : R1 has no predecessor (can be omitted)
93
SRI International Bioinformatics 93 Main vs Side Substrates Main vs. side substrates A B C D E F l Main compounds form the backbone of the pathway u substrates shared between connecting reactions u major inputs and outputs. l Side compounds omitted from pathway diagrams at low detail levels l Individual reactions do not necessarily have main and side compounds – a particular substrate may be either a main or a side depending on the pathway context.
94
SRI International Bioinformatics 94 Computing Directionality and Mains/Sides Our philosophy: Enable curator to specify as little as possible. Compute as much as possible. This reduces redundancy and potential for inconsistencies. Example: Reactions R1: A + B C + D R2: B E Predecessors: (R2 R1) Only substrate overlap is B B must be a main substrate A must be a side substrate, R1 must proceed from right to left R2 must proceed from left to right [Suzanne why?] C + D B E A
95
SRI International Bioinformatics 95 Unfortunately, mains, sides and reaction directions are sometimes ambiguous: At beginnings and ends of pathways l Use heuristics to determine main/side substrates at beginnings, ends of pathways l Not always what the curator wants Substrate overlap with both sides of a reaction, e.g. A + B C + D C + B E Solution: Additional slot PRIMARIES, should only be populated when necessary: PRIMARIES: (R (A B) (C)) says that for reaction R, A and B are both main reactants, and C is a main product. But…
96
SRI International Bioinformatics 96 More Complications… ENZYME-USE: a reaction may be catalyzed by multiple enzymes, but not all the enzymes necessarily participate in a given pathway l Not present in the same compartment with rest of pathway enzymes l Down-regulated or not expressed under conditions in which pathway is active l ENZYME-USE slot tells us which enzymes catalyze reaction in pathway, if not all. LAYOUT-ADVICE: helps software draw pathway correctly, e.g. in a cyclical pathway, tells which substrate should be at the top. HYPOTHETICAL-REACTIONS: list of reactions in the pathway that are considered hypothetical (i.e. no direct experimental evidence)
97
SRI International Bioinformatics 97 Polymerization Pathways … X [n] X [n+1] X [10] POLYMERIZATION-LINKS: specifies reactions that should be connected by a polymerization link (X R1 R1) --- REACTANT-NAME-SLOT: N-NAME --- PRODUCT-NAME-SLOT: N+1-NAME CLASS-INSTANCE-LINKS: specifies when a link should be drawn between a substrate class and some instance of it (necessary only if instance is not a member of some reaction, so no predecessor relationship can be defined) R1 --- PRODUCT-INSTANCES: X [10]
98
SRI International Bioinformatics 98 Super-Pathways Collection of pathways that connect to each other via common substrates or reactions, or as part of some larger logical unit Can contain both sub-pathways and additional connecting reactions Can be nested arbitrarily REACTION-LIST: a pathway ID instead of a reaction ID in this slot means include all reactions from the specified pathway PREDECESSORS: a pathway ID instead of a tuple in this slot means include all predecessor tuples from the specified pathway
99
SRI International Bioinformatics 99 Pathway Links Can be used as an alternative or in addition to defining super-pathways Link must be to or from some main substrate in the pathway Other end of link can be a pathway, a reaction, or an arbitrary text string Software automatically computes direction of link, but curator can override it
100
SRI International Bioinformatics 100 Querying Pathways Programmatically See http://bioinformatics.ai.sri.com/ptools/ptools-resources.html (all-pathways) (base-pathways) l Returns list of all pathways that are not super-pathways (genes-of-pathway pwy) (unique-genes-of-pathway pwy) l Returns list of all genes of a pathway that are not also part of other pathways (enzymes-of-pathway pwy) (substrates-of-pathway pwy) (variants-of-pathway pwy) l Returns all pathways in the same variant class as a pathway (get-predecessors rxn pwy), (get-successors rxn pwy) (get-rxn-direction-in-pathway pwy rxn) (pathway-inputs pwy), (pathway-outputs pwy) l Returns all compounds consumed (produced) but not produced (consumed) by pathway (ignores stoichiometry)
101
SRI International Bioinformatics 101 Example Queries Find all genes involved in metabolic pathways: (remove-duplicates (loop for p in (all-pathways) append (genes-of-pathway p))) Find all compounds that are unique to a single pathway: (loop for p in (base-pathways) append (loop for c in (substrates-of-pathway p) when (null (remove p (pathways-of-compound c))) collect (list c p)))
102
SRI International Bioinformatics 102 Regulation Significant recent expansion of regulation in Pathway Tools Class Regulation with subclasses that describe different biochemical mechanisms of regulation Slots: l Regulator l Regulated-Entity l Mode l Mechanism
103
SRI International Bioinformatics 103 Regulation of Enzyme Activity Class Regulation-of-Enzyme-Activity Each instance of the class describes one regulatory interaction Slots: l Regulator -- usually a small molecule l Regulated-Entity -- an Enzymatic-Reaction l Mechanism -- One of: u Competitive, Uncompetitive, Noncompetitive, Irreversible, Allosteric, Unkmech, Other l Mode -- One of: +, -
104
SRI International Bioinformatics 104 Transcription Initiation Class Regulation-of-Transcription-Initiation Slots: l Regulator -- instance of Proteins or Complexes (a transcription-factor) l Regulated-Entity -- instance of Promoters or Transcription- Units or Genes l Mode -- One of: +, -
105
SRI International Bioinformatics 105 Attenuation Class Transcriptional-Attenuation Several subclasses depending on type of attenuation Slots common to all: l Regulator -- Depends on subtype of attenuation l Regulated-Entity -- instance of Terminators or Genes or Transcription-Units l Mode -- One of: +, -
106
SRI International Bioinformatics 106 Attenuation Subtypes Small-Molecule-Mediated-Attenuation l Regulator = A small molecule l Leader transcript binds small molecule and determines formation of terminator or antiterminator RNA-Polymerase-Modification l Regulator = instance of Proteins or Complexes l Regulatory protein binds to site in transcription unit and interacts with RNA polymerase to determine termination RNA-Mediated-Attenuation Ribosome-Mediated-Attenuation Rho-Blocking-Antitermination Protein-Mediated-Attenuation
107
SRI International Bioinformatics 107 Transcriptional Regulation site001 pro001 trpE trpD trpC trpB trpA trpL Int003RpoSig70 TrpR*trpInt001 trpLEDCBA trp apoTrpR Int005
108
SRI International Bioinformatics 108 BioWarehouse: A Bioinformatics Database Warehouse Peter D. Karp, Thomas J. Lee, Valerie Wagner Oracle or MySQL UniProt ENZYME Genbank Taxonomy BioCyc CMR KEGG BioWarehouse = Java-based Loader = C-based Loader Oracle (10g) or MySQL (4.1.11) UniProt ENZYME Genbank Taxonomy BioCyc BioPAX BioWarehous e GO MAGE-ML KEGG CMR Eco2DBase
109
SRI International Bioinformatics 109 Motivations Hundreds of bioinformatics DBs exist Important problems involve queries across multiple DBs
110
SRI International Bioinformatics 110 Technical Approach Multi-platform support: Oracle (10g) and MySQL Schema support for multitude of bioinformatics datatypes Create loaders for public bioinformatics DBs l Parse file format of the source DB l Semantic transformations l Insert DB contents into warehouse tables Provide Warehouse query access mechanisms l SQL queries via ODBC, JDBC, OAA Operate public BioWarehouse server: publichouse BMC Bioinformatics 7:170 2006
111
SRI International Bioinformatics 111 BioWarehouse Schema Manages many bioinformatics datatypes simultaneously l Pathways, Reactions, Chemicals l Proteins, Genes, Replicons l Sequences, Sequence Features l Organisms, Taxonomic relationships l Computations (sequence matches) l Citations, Controlled vocabularies l Links to external databases Each type of warehouse object implemented through one or more relational tables (currently 43)
112
SRI International Bioinformatics 112 Warehouse Schema Manages multiple datasets simultaneously l Dataset = Single version of a database Version comparison Multiple software tools or experiments that require access to different versions Each dataset is a warehouse entity Every warehouse object is registered in a dataset
113
SRI International Bioinformatics 113 BioWarehouse Loaders DatabaseLoader Language Input Format Comments BioCycCBioCyc attribute-valuePathway/Genome Databases BioPAXJavaBioPAX formatProtein interactions data CMRCCMR column-delimitedComprehensive Microbial Resource: 350+ microbial genomes Eco2DbaseJavaRelational table dumpsE. coli 2-D gel data ENZYMEJavaENZYME attribute-valueEnzyme Commission set of reactions GenbankJavaXML derived from ASN.1Bacterial subset of Genbank Gene OntologyJavaOBO XMLHierarchical controlled vocabulary KEGGCKEGG formatMetabolic pathway data MAGE-MLJavaMAGE-ML formatMicroarray gene expression data NCBI TaxonomyCTaxonomy formatOrganism taxonomy UniProtJavaUniProt XMLSWISS-PROT and TrEMBL
114
SRI International Bioinformatics 114 Acknowledgements SRI l Michelle Green, Ron Caspi, Ingrid Keseler, John Pick, Carol Fulcher, Markus Krummenacker, Alex Shearer EcoCyc Collaborators l Julio Collado-Vides, John Ingraham, Ian Paulsen MetaCyc Collaborators l Sue Rhee, Peifen Zhang, Hartmut Foerster, Chris Tissier BioCyc Collaborators l Christos Ouzounis and EBI CGG Funding sources: l NIH National Center for Research Resources l NIH National Institute of General Medical Sciences l NIH National Human Genome Research Institute l Department of Energy Microbial Cell Project l DARPA BioSpice BioCyc.org Learn more from BioCyc webinars: biocyc.org/webinar.shtml
115
SRI International Bioinformatics 115 Chokepoint Example For Antibiotic Target Development Find Strategic Essential Weak Links in Metabolism Many Compounds have just 1 Producing and consuming reaction (defun chokepoint-1 () (remove-duplicates (loop for cpd in (remove-if-not #'coercible-to-frame-p (all-substrates (all-rxns))) when (= 1 (length (get-slot-values cpd 'APPEARS-IN-LEFT-SIDE-OF)) (length (get-slot-values cpd 'APPEARS-IN-RIGHT-SIDE-OF))) collect (get-slot-value cpd 'APPEARS-IN-LEFT-SIDE-OF) and collect (get-slot-value cpd 'APPEARS-IN-RIGHT-SIDE-OF) ) :test #'fequal) ) ;;; invoking the query: (length (chokepoint-1)) ==> 348
116
SRI International Bioinformatics 116 Substring Search Example Find all that genes that contain a given substring within their common name or synonym list. (defun find-gene-by-substring (substring) (let (result) (loop for g in (get-class-all-instances '|Genes|) do (loop for name in (get-slot-values g 'names) when (search substring name :test #'string-equal) do (pushnew g result) ) ) result ) )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.