Data Content of the BioCyc Databases
BioCyc Tier 1 Databases
SRI International Bioinformatics EcoCyc Project – EcoCyc.org E. coli Encyclopedia l Review-level Model-Organism Database for E. coli l Tracks evolving annotation of the E. coli genome and cellular networks l The two paradigms of EcoCyc “Multi-dimensional annotation of the E. coli K-12 genome” l Positions of genes; functions of gene products – 76% / 66% exp l Gene Ontology terms; MultiFun terms l Gene product summaries and literature citations l Evidence codes l Multimeric complexes l Metabolic pathways l Regulation of gene expression and of protein activity Nuc. Acids Res. 35: ASM News 70: Science 293:2040 Karp, Gunsalus, Collado-Vides, Paulsen
SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,492 Proteins: 4,479 Complexes: 895 RNAs: 285 Reactions: Metabolic: 1394 Transport: 246 Pathways: 246 Compounds: 1,830 URL: EcoCyc.org Gene Regulation: Operons: 3,369 Trans Factors: 196 Promoters: 1,796 TF Binding Sites: 2,205 EcoCyc v13.6 Citations: 19,000
SRI International Bioinformatics EcoCyc Gene and Protein Information Gene locations and protein functions updated through literature curation and in collaboration with RefSeq, EcoGene, and UniProt EcoCyc curators author minireview summaries for gene products, complexes, pathways, and transcription units Gene Ontology terms curated by EcoCyc and imported regularly from UniProt Protein features regulatory imported from UniProt
SRI International Bioinformatics EcoCyc Regulation Multiple types of regulatory information present in EcoCyc l Transcriptional regulation and operon organization l Attenuation l Regulation of translation by small RNAs and proteins l Regulation of protein activity by covalent and non-covalent means
SRI International Bioinformatics Other E. coli Genomes in BioCyc Currently BioCyc contains ~40 other E. coli and Shigella genomes New genomes will be included from RefSeq as BioCyc expands SRI is building orthology-based curation tools that will allow us to propagate curation from EcoCyc to these other strains
SRI International Bioinformatics EcoCyc Accelerates Science Experimentalists l E. coli experimentalists l Experimentalists working with other microbes l Analysis of expression data Computational biologists l Biological research using computational methods l Genome annotation l Study connectivity of E. coli metabolic network l Study phylogentic extent of metabolic pathways and enzymes in all domains of life Bioinformaticists l Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions, Metabolic engineers l “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “ Educators
SRI International Bioinformatics EcoliHub Resource Hub search l Simultaneously searches 12 different E. coli databases EcoliHub Omics l Omics data repository and analysis for E. coli EcoliHouse l Queryable MySQL server containing multiple E. coli databases EcoliWiki l Community contributed content about E. coli
SRI International Bioinformatics MetaCyc : Metabolic Encyclopedia Describe a representative sample of every experimentally determined metabolic pathway Describe properties of metabolic enzymes Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Jointly developed by l P. Karp, R. Caspi, C. Fulcher, SRI International l L. Mueller, A. Pujar, Boyce Thompson Institute l S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research 2008
SRI International Bioinformatics MetaCyc Data -- Version 14.0 Pathways 1,471 Reactions 8,409 Enzymes 6,198 Small Molecules 8,572 Organisms1,861 Citations 22,459
SRI International Bioinformatics Taxonomic Distribution of MetaCyc Pathways – version 13.1 Bacteria883 Green Plants 607 Fungi 199 Mammals 159 Archaea112
SRI International Bioinformatics MetaCyc Pathway Ontology Provides a classification system for metabolic pathways
SRI International Bioinformatics Biosynthesis [902] l Amino acids Biosynthesis [105] l Aromatic Compounds Biosynthesis [13] l Carbohydrates Biosynthesis [70] l Cell structures Biosynthesis [31] l Cofactors, Prosthetic Groups, Electron Carriers Biosynthesis [160] l Hormones Biosynthesis [40] l Fatty Acids and Lipids Biosynthesis [101] l Metabolic Regulators Biosynthesis [4] l Nucleosides and Nucleotides Biosynthesis [20] l Amines and Polyamines Biosynthesis [32] l Secondary Metabolites Biosynthesis [351] u Antibiotic Biosynthesis [20] u Fatty Acid Derivatives Biosynthesis [7] u Flavonoids Biosynthesis [70] u Nitrogen-Containing Secondary Compounds Biosynthesis [64] –Alkaloids Biosynthesis [43] u Phenylpropanoid Derivatives Biosynthesis [46] u Phytoalexins Biosynthesis [25] u Sugar Derivatives Biosynthesis [10] u Terpenoids Biosynthesis [103] l Siderophore Biosynthesis [7]
SRI International Bioinformatics Degradation/Utilization/Assimilation [639] l Alcohols Degradation [14] l Aldehyde Degradation [12] l Amines and Polyamines Degradation [40] l Amino Acids Degradation [113] l Aromatic Compounds Degradation [152] l C1 Compounds Utilization and Assimilation [24] l Carbohydrates Degradation [52] l Carboxylates Degradation [30] l Chlorinated Compounds Degradation [39] l Cofactors, Prosthetic Groups, Electron Carriers Degradation [2] l Fatty Acid and Lipids Degradation [18] l Inorganic Nutrients Metabolism [72] u Nitrogen Compounds Metabolism [15] u Phosphorus Compounds Metabolism [3] u Sulfur Compounds Metabolism [54] l Nucleosides and Nucleotides Degradation and Recycling [9] l Secondary Metabolites Degradation [58] u Nitrogen Containing Secondary Compounds Degradation [13] u Sugar Derivatives Degradation [31] u Terpenoids Degradation [10]
SRI International Bioinformatics Detoxification [16] l Acid Resistance [2] l Arsenate Detoxification [3] l Mercury Detoxification [1] l Methylglyoxal Detoxification [8]
SRI International Bioinformatics Generation of precursor metabolites and energy [124] l Chemoautotrophic Energy Metabolism [14] u Hydrogen Oxidation [2] l Electron Transfer [11] l Fermentation [34] l Glycolysis [6] l Methanogenesis [12] l Pentose Phosphate Pathways [4] l Photosynthesis [6] l Respiration [25] u Aerobic Respiration [9] u Anaerobic Respiration [14] l TCA cycle [9]
Tier 3 Databases
SRI International Bioinformatics Curation Level EcoCyc and MetaCyc have many types of data that you will not see in Tier 3 databases Examples: l Regulation l Minireview summaries l Citations l GO terms l Protein features
SRI International Bioinformatics BioCyc Ortholog Data Currently BioCyc ortholog data obtained from CMR all-vs-all protein BLAST comparisons Require bidirectional best BLAST hits, at least 10% identity, at least 40% similiarity, P-value under 1 Not all organisms contain ortholog data currently l CMR lacks entries for some organisms l Some BioCyc genomes not obtained from CMR