Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh
Outline Description of BioCyc data – Format – Key Classes How I am retrieving and storing the data – SPDB schema – Key tables Recent Developments
BioCyc Data Format Frames are made of slots – Slots are made of facets – Slots values can have annotations Slot Frame Facet Annotation Reaction X Common Name EC # Reactants Coefficient Compartment :VALUE-TYPE, :DOCUMENTATION
BioCyc Class Hierarchy…. Complicated
Key Classes in BioCyc Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters
Why not just use BioCyc? Advantages: – Fast access to individual objects – Logic based assertions Disadvantages – Hard to query – Difficult to understand the structures – Difficult to know all of what is in the database – Difficult to integrate other types of data Solution: – Create a relational database
SPDB Schema Simple Pathway DataBase
Pathway “Central” table Allows organization of major pathways Easy to retrieve a pathway, or all reactions that share a pathway with a specified reaction
Reaction Reactions types include: – Catalysis, Spontaneous, Transcription, Translation, Promoter, Transcription Factor Transcription, Translation, Promoter, and TF reactions are all inferred reactions Reactions are the “nodes” of networks in SPBD
Entity Entities include: – Compound, Protein (Complex/Monomer), Gene, Transcription Unit, Promoter Entities with multiple types are represented with the most specific type in its hierarchy – (i.e. A protein that is also a complex will be listed as “Complex”, not “Protein” – “Enzyme” status is stored as a participation type
Participation in Reactions Entities participate in reactions Information includes km data Unsure if condition data exists, and unsure how to access evidence data
Data Links in BioCyc Pathway Reaction Reactants/ProductsEnzymes/Cofactors Genes Transcriptional Unit Promoter Transcription FactorSigma Factor Translation Reaction Transcription Reaction Promoter Relation Activation/Repression Specificity Relation
Data Retrieval Strategy Pathway Reaction Reactants/ProductsEnzymes/Cofactors Genes Transcriptional Unit Promoter Transcription FactorSigma Factor Translation Reaction Transcription Reaction Promoter Relation Activation/Repression Specificity Relation 1 2 3
Improvements to SPDB Explicitly organize pathway networks and reaction networks Allow recursive tracing of pathway elements
Old Organization of Reaction Data Pathway Rxn
Better Way Rxn Pathway Explicitly link reactions in the context of individual pathways
Recursively Tracing the Data Pathway Reaction Reactants/ProductsEnzymes/Cofactors Genes Transcriptional Unit Promoter Transcription FactorSigma Factor Translation Reaction Transcription Reaction Promoter Relation Activation/Repression Specificity Relation Genes of TFs
Coefficient Data for Reactions 6 ATP + 3 L-serine + 3 2,3-dihydroxybenzoate 6 diphosphate + 6 AMP + enterobactin + 9 H +
To Do MIAME experimental conditions Explore other data in BioCyc
Flow of Data (The Big Picture) Data is imported from BioCyc (EcoCyc + MetaCyc) Changes can be made to BioCyc via Cell Designer, which will then be propagated to SPDB Biomart is one option to directly view data in SPDB BioCyc PGDB SPDB JavaCycConnectionBioCycImporter Lisp Based DB MySQL Object Oriented DB API based on JavaCyc Cell Designer BioMart Researcher
Data in BioCyc SPDB Pathways 242 (Excludes Superpathways) Reactions (1751 not inferred, 4373 ‘orphaned’) Enzymes Transporters Gene product summaries Genes Transcription Units Citations18,46917,842--
SPDB Networks
BioCyc Updates March 13, 2006January 10, 2007April 1, 2008March 9, 2009 May 19, 2006March 16, 2007June 27, 2008June 19, 2009 September 8, 2006May 25, 2007October 15, 2008 August 15, 2007 December 5, 2007 Update history shows from 1 to 5 updates per year (~3 times a year on avg) Will have to manually import check for updates and import new data into our database “Actual curation of the data occurs within BioCyc, and the information is periodically propagated to RegulonDB.”
SPDB Schema Simple Pathway DataBase Compound Complex Gene TranscriptionUnit Promoter Monomer Frame Reactant Product Modifier Cofactor Activator Repressor Promoter Catalysis Spontaneous Transcription Translation Promoter Transcription Factor