Computing with Pathway/Genome Databases

Slides:



Advertisements
Similar presentations
Números.
Advertisements

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Simplifications of Context-Free Grammars
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
Introduction to LISP Programming of Pathway Tools Queries and Updates.
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
EuroCondens SGB E.
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
& dding ubtracting ractions.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
PP Test Review Sections 6-1 to 6-6
1 IMDS Tutorial Integrated Microarray Database System.
Briana B. Morrison Adapted from William Collins
Introduction to Structured Query Language (SQL)
Regression with Panel Data
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas Reference Manual Roger L. Costello XML Technologies Course.
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
1 GIS Maps and Tax Roll Submission. 2 Exporting A New Shapefile.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Figure 10–1 A 64-cell memory array organized in three different ways.
Types of selection structures
Static Equilibrium; Elasticity and Fracture
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
SRI International Bioinformatics Data Import / Export Markus Krummenacker Bioinformatics Research Group SRI, International Q
1.step PMIT start + initial project data input Concept Concept.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
SRI International Bioinformatics 1 The PerlCyc and JavaCyc APIs.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Writing Programs that Analyze Pathway/Genome Databases Markus Krummenacker Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 The Structured Advanced Query Page Mario Latendresse Tomer Altman Bioinformatics Research Group SRI International March,
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
The Pathway Tools Schema
Presentation transcript:

Computing with Pathway/Genome Databases

Aprox presentation time: 1.5 hrs

Overview Summary of Pathway Tools data access mechanisms and formats Pathway Tools APIs Overview of Pathway Tools schema

Motivations to Understanding Schema When writing complex queries to PGDBs, those queries must refer to classes and slots within the schema Queries using Lisp, Perl, Java APIs Queries using Structured Advanced Query Form Queries using BioVelo Find all monomers longer than 1,000 amino acids (loop for g in (get-class-all-instances ‘|Genes|) when (< 1000 (abs (- (get-slot-value g ‘left-end-position) (get-slot-value g ‘right-end-position) )) collect (get-slot-value g ‘product) )

Pathway Tools Implementation Details Platforms: Macintosh, PC/Linux, and PC/Windows platforms Same binary can run as desktop app or Web server Production-quality software Version control Two regular releases per year Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade 420,000 lines of Lisp code

More Information Pathway Tools Web Site, Tutorial Slides http://bioinformatics.ai.sri.com/ptools/ PTools APIs: http://brg.ai.sri.com/ptools/ptools-resources.html Web services: http://biocyc.org/web-services.shtml Guide to the Pathway Tools Schema http://biocyc.org/schema.shtml Curator's Guide http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf

References Ontology Papers section of http://biocyc.org/publications.shtml "An Evidence Ontology for use in Pathway/Genome Databases" "An ontology for biological function based on molecular interactions" "Representations of metabolic knowledge: Pathways" "Representations of metabolic knowledge"

Data Exchange APIs: Lisp API, Java API, and Perl API Read and modify access Web services Cyclone Export to files BioPAX Export Biopax.org Export PGDB genome to Genbank format Export entire PGDB as column-delimited and attribute-value file formats Export PGDB reactions as SBML -- sbml.org Import/Export of Pathways: between PGDBs Import/Export of Selected Frames, for Spreadsheets Import/Export of Compounds as Molfile, CML BioWarehouse : Loader for Flatfiles, SQL access http://bioinformatics.ai.sri.com/biowarehouse/ BMC Bioinformatics 7:170 2006

Pathway Tools Ontology / Schema Ontology classes: 1621 Datatype classes: Define objects from genomes to pathways Classification systems for pathways, chemical compounds, enzymatic reactions (EC system) Protein Feature ontology Controlled vocabularies: Cell Component Ontology Evidence codes Comprehensive set of 279 attributes and relationships

High-Level Classes in the Pathway Tools Ontology Chemicals -- All molecules Polymer-Segments -- Regions of polymers Protein-Features -- Features on proteins Organisms Reactions -- Biochemical reactions Enzymatic-Reactions -- Link enzymes to reactions they catalyze Pathways -- Metabolic and signaling pathways Regulation -- Regulatory interactions CCO -- Cell Component Ontology Evidence -- Evidence ontology Gene-Ontology-Terms -- GO Growth-Observations -- Observations of growth of organism Notes -- Timestamped, person-stamped notes Organizations, People Publications

Navigating the Schema

Use GKB Editor to Inspect the Pathway Tools Ontology GKB Editor = Generic Knowledge Base Editor Type in Navigator window: (GKB) or [Right-Click] Edit->Ontology Editor View->Browse Class Hierarchy [Middle-Click] to expand hierarchy To view classes or instances, select them and: Frame -> List Frame Contents Frame -> Edit Frame

Use the SAQP to Inspect the Schema

Pathway Tools Schema Guide to the Pathway Tools Schema Schema overview diagram

Principal Classes Class names are capitalized, plural, separated by dashes Genetic-Elements, with subclasses: Chromosomes Plasmids Genes Transcription-Units RNAs rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses: Polypeptides Protein-Complexes

Principal Classes Reactions, with subclasses: Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements

Principal Classes Regulation

Slot Links TCA Cycle in-pathway Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase reaction catalyzes component-of Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhA sdhB sdhC sdhD product

Programmatic Access to BioCyc Common LISP Native language of Pathway Tools Interactive & Mature Environment Full Access to the Data & Many Utility Functions Source code is available for academics PerlCyc API of Functions, Exposed to Perl Communication through UNIX Socket JavaCyc API of Functions, Exposed to Java Cyclone

Cyclone Developed by Schachter and colleagues from Genoscope http://nemo-cyclone.sourceforge.net/archi.php Cyclone is a Java-based system that: Extracts data from a Pathway Tools PGDB Converts it to an XML schema Maps the data to Java objects and to a relational database Changes made to the data on the Java side can be committed back to a Pathway Tools PGDB

Lisp API Accessible whenever you start Pathway Tools with the –lisp argument Lisp queries evaluate against the running Pathway Tools binary and execute very fast

Ocelot Object Database

Pathway Tools Implementation Details Platforms: Macintosh, PC/Linux, and PC/Windows platforms Same binary can run as desktop app or Web server Production-quality software Version control Two regular releases per year Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade 600,000 lines of Lisp code

Pathway Tools Architecture Genome Navigator Web Mode Desktop Mode Lisp Perl Java Protein Editor Pathway Editor Reaction Editor GFP API Oracle or MySQL Disk File Ocelot DBMS

Ocelot Object Database Frame data model Classes, instances, inheritance Frames have slots that define their properties, attributes, relationships A slot has one or more values Datatypes include numbers, strings, etc. Slotunit frames define metadata about slots: Domain, range, inverse Collection type, number of values, value constraints

Storage System Architecture File KBs Read-only applications can be distributed without a relational DBMS Load all objects and code into Lisp memory Dump virtual memory to binary executable file

Ocelot Storage System Architecture Persistent storage via disk files, MySQL or Oracle DBMS Concurrent development: MySQL or Oracle Single-user development: disk files Relational DBMS storage RDBMS is submerged within Ocelot, invisible to users Frames transferred from RDBMS to Ocelot On demand By background prefetcher Memory cache Persistent disk cache to speed performance via Internet

Transaction Logging Relational DBMS stores The latest version of each Ocelot frame A log of all GFP operations applied to KB Transaction log enables: Reconstruction of earlier versions of KB View history of changes to an object Update replicates of a KB Detection of update conflicts during concurrency control Undo of updates

Optimistic Concurrency Control Locking approach: edits to one object can require locking all connected objects No locking User performs updates in local workspace When user commits changes, storage system compares user changes against all other committed changes

Ocelot Knowledge Server Schema Evolution FRSs store and process class and instance information similarly Application can query schema information as easily as it can query instances Schema is stored within the DB Schema is self documenting Schema evolution facilitated by Easy addition/removal of slots, or alteration of slot datatypes Flexible data formats that do not require dumping/reloading of data

Generic Frame Protocol (GFP) A library of procedures for accessing Ocelot DBs GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html A small number of GFP functions are sufficient for most complex queries

Example of a Single GFP Call The General Pattern: gfp-function(frame slot value ...) (gfp-function frame slot value …) LISP (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER)

Frame References At the GFP level, every Ocelot frame can be referred to using either symbol frame name or frame object Most GFP functions return frame objects Importance of using fequal for comparisons

Generic Frame Protocol get-class-all-instances (Class) Returns direct and indirect instances of Class coercible-to-frame-p (Thing) Is Thing a frame? Returns True if Thing is the name of a frame, or a frame object; else False

Generic Frame Protocol Notation Frame.Slot means a specified slot of a specified frame. Note: Slot must be a symbol! get-slot-value(Frame Slot) Returns first value of Frame.Slot get-slot-values(Frame Slot) Returns all values of Frame.Slot as a list slot-has-value-p(Frame Slot) Returns True if Frame.Slot has at least one value; else False member-slot-value-p(Frame Slot Value) Returns True if Value is one of the values of Frame.Slot; else False Instance-all-instance-of-p(Instance Class) Returns True if Instance is an all-instance of Class

Generic Frame Protocol print-frame(Frame) Prints the contents of Frame

Generic Frame Protocol – Update Operations put-slot-value(Frame Slot Value) Replace the current value(s) of Frame.Slot with Value put-slot-values(Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values add-slot-value(Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any remove-slot-value(Frame Slot Value) Remove Value from the current value(s) of Frame.slot replace-slot-value(Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value remove-local-slot-values(Frame Slot) Remove all of the values of Frame.Slot

Generic Frame Protocol – Update Operations save-kb Saves the current KB

Additional Pathway Tools Functions – Semantic Inference Layer Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB http://bioinformatics.ai.sri.com/ptools/ptools-fns.html

PerlCyc and JavaCyc Work on Unix (Solaris or Linux) only Start up Pathway Tools with the –api arg Pathway Tools listens on a Unix socket – perl program communicates through this socket Supports both querying and editing PGDBs Must run perl or java program on the same machine that runs Pathway Tools This is a security measure, as the API server has no built-in security Can only handle one connection at a time

Obtaining PerlCyc and JavaCyc Download from http://www.sgn.cornell.edu/downloads/ PerlCyc written and maintained by Lukas Mueller at Boyce Thompson Institute for Plant Research. JavaCyc written by Thomas Yan at Carnegie Institute, maintained by Lukas Mueller. Easy to extend…

Examples of PerlCyc, JavaCyc Functions getSlotValues getClassAllInstances putSlotValues genesOfReaction findIndexedFrame pathwaysOfGene transportP GFP functions (require knowledge of Pathway Tools schema): get_slot_values get_class_all_instances put_slot_values Pathway Tools functions (described at http://bioinformatics.ai.sri.com/ptools/ptools-fns.html): genes_of_reaction find_indexed_frame pathways_of_gene transport_p

Writing a PerlCyc or JavaCyc program Create a PerlCyc, JavaCyc object: perlcyc -> new (“ORGID”) new Javacyc (“ORGID”) Call PerlCyc, JavaCyc functions on this object: my $cyc = perlcyc -> new (“ECOLI”); my @pathways = $cyc -> all_pathways (); Javacyc cyc = new Javacyc(“ECOLI”); ArrayList pathways = cyc.allPathways (); Functions return object IDs, not objects. Must connect to server again to retrieve attributes of an object. foreach my $p (@pathways) { print $cyc -> get_slot_value ($p, “COMMON-NAME”);} for (int i=0; I < pathways.size(); i++) { String pwy = (String) pathways.get(i); System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); }

Sample PerlCyc Query Number of proteins in E. coli use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); my @proteins = $cyc-> get_class_all_instances("|Proteins|"); my $protein_count = scalar(@proteins); print "Protein count: $protein_count.\n";

Sample PerlCyc Query Print IDs of all proteins with molecular weight between 10 and 20 kD and pI between 4 and 5. use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { my $mw = $cyc->get_slot_value($p, "molecular-weight-kd"); my $pI = $cyc->get_slot_value($p, "pi"); if ($mw <= 20 && $mw >= 10 && $pI <= 5 && $pI >= 4) { print "$p\n"; }

Sample PerlCyc Query List all the transcription factors in E. coli, and the list of genes that each regulates: use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { if ($cyc->transcription_factor_p($p)) { my $name = $cyc->get_slot_value($p, "common-name"); my %genes = (); foreach my $tu ($cyc->regulon_of_protein($p)) { foreach my $g ($cyc->transcription_unit_genes($tu)) { $genes{$g} = $cyc->get_slot_value($g, "common-name"); } print "\n\n$name: "; print join " ", values %genes;

Sample Editing Using PerlCyc Add a link from each gene to the corresponding object in MY-DB (assume ID is same in both cases) use perlcyc; my $cyc = perlcyc -> new (“HPY”); my @genes = $cyc->get_class_all_instances (“|Genes|”); foreach my $g (@genes) { $cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”); } $cyc->save_kb();

Sample JavaCyc Query: Enzymes for which ATP is a regulator import java.util.*; public class JavacycSample { public static void main(String[] args) { Javacyc cyc = new Javacyc("ECOLI"); ArrayList regframes = cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|"); for (int i = 0; i < regframes.size(); i++) { String reg = (String)regframes.get(i); boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP"); if (bool) { String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”); String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”); System.out.println(enz); } } } }

Simple Lisp Query Example: Enzymes for which ATP is a regulator (defun atp-inhibits () (loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|) ;; Does the Regulator slot contain the compound ATP, and the mode ;; of regulation is negative (inhibition)? when (and (member-slot-value-p x ‘Regulator 'ATP) (member-slot-value-p x ‘Mode “-”) ) ;; Whenever the test is positive, we collect the value of the slot Enzyme ;; of the Regulated-Entity of the regulatory interaction frame. ;; The collected values are returned as a list, once the loop terminates. collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) ) ) ;;; invoking the query: (select-organism :org-id 'ECOLI) (atp-inhibits) (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER)

Simple Perl Query Example: Enzymes for which ATP is a regulator use perlcyc; my $cyc = perlcyc -> new("ECOLI"); my @regs = $cyc -> get_class_all_instances("|Regulation-of-Enzyme- Activity|"); ## We check every instance of the class foreach my $reg (@regs) { ## We test for whether the INHIBITORS-ALL ## slot contains the compound frame ATP my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp"); my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-"); if ($bool1 && $bool2) { ## Whenever the test is positive, we collect the value of the slot ENZYME . ## The results are printed in the terminal. my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity"); my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme"); print STDOUT "$enz\n"; }

Getting started with Lisp pathway-tools –lisp (load “file”) (compile-file “file.lisp”) Emacs is a useful editor Pathway Tools source code is available: ask Overview of Lisp information resources: http://bioinformatics.ai.sri.com/ptools/ptools-resources.html Documented Pathway Tools Lisp functions: http://brg.ai.sri.com/ptools/ptools-fns.html

Viewing Results via the Answer List (loop for r in (get-class-all-instances '|Reactions|) when (< 3 (length (get-slot-values r 'left))) collect r) (setq answer *) (object-table answer) (replace-answer-list answer) (pt) Next Answer

Query Gotchas Study schema carefully :test #’fequal Cascade of slot-values: check for NIL

Semantic Inference Layer relationships.lisp Library of functions that encapsulate common query building blocks and intricacies of navigating the schema enzymes-of-gene reactions-of-gene pathways-of-gene genes-of-pathway pathway-hole-p reactions-of-compound top-containers(protein) all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport etc.) (all-rxns :metab-pathways)

Pathway Tools Schema and Semantic Inference Layer Genes, Operons, and Replicons

Representing a Genome Gene1 Product1 Gene2 CHROM1 Gene3 CHROM2 ORG components Gene1 Product1 CHROM1 Gene2 genome Gene3 ORG CHROM2 PLASMID1 Classes: ORG is of class Organisms CHROM1 is of class Chromosomes PLASMID1 is of class Plasmids Gene1 is of class Genes Product1 is of class Polypeptides or RNA

Review slots of COLI and of COLI-K12 Polynucleotides Review slots of COLI and of COLI-K12

Genetic-Elements Sequence is stored in a separate file or database table

Polymer-Segments Review slots of Genes

Complexities of Gene / Gene-Product Relationships The Product of a gene can be an instance of Polypeptides or RNAs An instance of Polypeptides can have more than one gene encoding it Sequence position: Nucleotide positions of starting and ending codons specified in Left-End-Position and Right-End-Position (usually greater, except at origin) Transcription-Direction + / - Alternative splicing: Nucleotide positions of starting and ending codons specified in Left-End-Position and Right-End-Position Intron positions specified in Splice-Form-Introns of gene product (200 300) (350 400)

Gene Reaction Schematic

Exercises Find all genes on a given chromosome Find all ribosomal RNAs Find the DNA sequence of a given gene Find all proteins longer than 1,000 amino acids

Exercises Find all genes on a given chromosome (defun genes-of-chrom (chrom) (loop for x in (get-slot-values chrom ‘components) when (instance-all-instance-of-p x ‘|Genes|) collect x) ) Find all ribosomal RNAs (get-class-all-instances ‘|rRNAs|) Find the DNA sequence of a given gene (get-gene-sequence gene)

Exercises Find all monomers longer than 1,000 nucleotides (loop for g in (get-class-all-instances ‘|Genes|) for p = (get-slot-value g ‘product) when (and (< 1000 (abs (- (get-slot-value g ‘left-end-position) (get-slot-value g ‘right-end-position) ))) (instance-all-instance-of-p p ‘|Polypeptides|) ) collect p )

Proteins

Proteins and Protein Complexes Polypeptide: the monomer protein product of a gene (may have multiple isoforms, as indicated at gene level) Protein complex: proteins consisting of multiple polypeptides or protein complexes Example: DNA pol III DnaE is a polypeptide pol III core is DnaE and two other polypeptides pol III holoenzymes is several protein complexes combined

Protein Complex Relationships

Slots of a protein (DnaE) catalyzes Is it an activator/reactant/etc? comments component-of dblinks features (edited in feature editor) Many other features possible

A complex at the frame level (pol III) Same features as polypeptide frame, different use comment component-of and components note coefficients

Protein Complex Relationships

Relationships are Defined in Many Places component-of comes from creating a complex appears-in-left-side-of comes from defining a reaction (as do modified forms) inhibitor-of comes from an enzymatic reaction can only edit dna-footprint if protein has been associated with a TU

Semantic Inference Layer Reactions-of-protein (prot) Returns a list of rxns this protein catalyzes Transcription-units-of-proteins(prot) Returns a list of TU’s activated/inhibited by the given protein Transporter? (prot) Is this protein a transporter? Polypeptide-or-homomultimer?(prot) Transcription-factor? (prot) Obtain-protein-stats Returns 5 values Length of : all-polypeptides, complexes, transporters, enzymes, etc…

Example Find all enzymes that use pyridoxal phosphate as a cofactor or prosthetic group (loop for protein in (get-class-all-instances ‘|Proteins|) for enzrxn = (get-slot-value protein ‘enzymatic-reaction) when (and enzrxn (or (member-slot-value-p enzrxn ‘cofactors ‘pyridoxal_phosphate) (member-slot-value-p enzrxn ‘prosthetic-groups ‘pyridoxal_phosphate)) collect protein) (member-slot-value-p frame slot value) : T if Value is one of the values of Slot of Frame.

Example Queries Find all homomultimers Find proteins whose pI > 10, and that reside on the negative strand of the first chromosome

Sample Find all proteins without a comment anywhere

Compounds / Reactions / Pathways

Compounds / Reactions / Pathways Think of a three tiered structure: Reactions built on top of compounds Pathways built on top of reactions Metabolic network defined by reactions alone; pathways are an additional “optional” structure Some reactions not part of a pathway Some reactions have no attached enzyme Some enzymes have no attached gene

Compounds

Compounds Relatively few aspects of a compound defined within the compound editor MW, formula calculated from edited structure Most aspects defined in other editors “Pathway reactions” comes from reaction editing followed by pathway editing Activator, etc come from the enzymatic reaction editor

Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids| -- Instance TRP --- Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids| APPEARS-IN-LEFT-SIDE-OF: RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN, TRYPTOPHAN--TRNA-LIGASE-RXN APPEARS-IN-RIGHT-SIDE-OF: RXN0-2382, RXN0-301, TRANS-RXN-76, TRYPSYN-RXN CHEMICAL-FORMULA: (C 11), (H 12), (N 2), (O 2) COMMON-NAME: "L-tryptophan" DBLINKS: (LIGAND-CPD "C00078" NIL |kaipa| 3311532640 NIL NIL), (CAS "6912-86-3"), (CAS "73-22-3") NAMES: "L-tryptophan", "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid" SMILES: "c1(c(CC(N)C(=O)O)c2(c([nH]1)cccc2))" SYNONYMS: "W", "tryptacin", "trofan", "trp", "tryptophan", ____________________________________________

Where is diphosphate in the ontology?

Semantic Inference Layer Reactions-of-compound (cpd) Pathways-of-compound (cpd) Is-substrate-an-autocatalytic-enzyme-p (cpd) Activated/inhibited-by? (cpds slots) Returns a list of enzrxns for which a cpd in cpds is a modulator (example slots: activators-all, activators-allosteric) All-substrates (rxns) All unique substrates specified in the given rxns Has-structure-p (cpd) Obtain-cpd-stats Returns two values: Length of :all-cpds, cpds with structures

Miscellaneous things…. History List Back/Forward and History buttons Default list is 50 items Show frame (print-frame ‘frame)

Queries with Multiple Answers Navigator queries: Example: Substring search for “pyruvate” Selected list is placed on the Answer list Use “Next Answer” button to view each one of them Lisp queries: Example : Find reactions involving pyruvate as a substrate (get-class-all-instances ‘|Compounds|) (loop for rxn in (get-class-all-instances ‘|Reactions|) when (member ‘pyruvate (get-slot-values rxn ‘substrates) collect rxn) (replace-answer-list * )