Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pathway Tools User Group Meeting Introduction Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.

Similar presentations


Presentation on theme: "Pathway Tools User Group Meeting Introduction Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org."— Presentation transcript:

1 Pathway Tools User Group Meeting Introduction Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com BioCyc.org EcoCyc.org MetaCyc.org HumanCyc.org

2 SRI International Bioinformatics Overview Goals of meeting Terminology Pathway Tools and BioCyc – The Big Picture Updates to EcoCyc and MetaCyc More information Optional: Speakers contribute talks to web site

3 SRI International Bioinformatics Meeting Goals Share experiences on how to make optimal use of Pathway Tools and BioCyc What new add-on tools are people developing that others might want to use? Coordinate future software development by SRI and other groups l What software enhancements are needed? l Example: New inference modules – GO terms, cell location Give us feedback on how we can better serve you

4 SRI International Bioinformatics Terminology Databases vs Software xCyc’s vs Pathway Tools

5 SRI International Bioinformatics BioCyc Collection of Pathway/Genome Databases Pathway/Genome Database (PGDB) – combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors/sites, promoters, operons Tier 1: Literature-Derived PGDBs l MetaCyc l EcoCyc -- Escherichia coli K-12 l BioCyc Open Chemical Database Tier 2: Computationally-derived DBs, Some Curation -- 18 PGDBs l HumanCyc l Mycobacterium tuberculosis Tier 3: Computationally-derived DBs, No Curation -- 145 DBs

6 SRI International Bioinformatics Terminology – Pathway Tools Software PathoLogic l Predicts operons, metabolic network, pathway hole fillers, from genome l Computational creation of new Pathway/Genome Databases Pathway/Genome Editors l Distributed curation of PGDBs l Distributed object database system, interactive editing tools Pathway/Genome Navigator l WWW publishing of PGDBs l Querying, visualization of pathways, chromosomes, operons l Analysis operations u Pathway visualization of gene-expression data u Global comparisons of metabolic networks Bioinformatics 18:S225 2002

7 SRI International Bioinformatics BioCyc Tier 3 145 PGDBs l 130 prokaryotic PGDBs created by SRI u Source: CMR database l 15 prokaryotic and eukaryotic PGDBs created by EBI u Source: UniProt Automated processing by PathoLogic l Pathway prediction l Operon prediction (bacteria) l Pathway hole filler predictions All PGDBs available for adoption

8 SRI International Bioinformatics Family of Pathway/Genome Databases MetaCyc EcoCyc CauloCyc AraCyc MtbRvCyc HumanCyc

9 SRI International Bioinformatics Pathway/Genome DBs Created by External Users More than 500 licensees of Pathway Tools 50 groups applying the software to more than 80 organisms Software freely available to academics; Each PGDB owned by its creator Saccharomyces cerevisiae, SGD project, Stanford University l pathway.yeastgenome.org/biocyc / TAIR, Carnegie Institution of Washington Arabidopsis.org:1555 dictyBase, Northwestern University GrameneDB, Cold Spring Harbor Laboratory Planned: l CGD (Candida albicans), Stanford University l MGD (Mouse), Jackson Laboratory l RGD (Rat), Medical College of Wisconsin l WormBase ( C. elegans ), Caltech DOE Genomes to Life contractors: l G. Church, Harvard, Prochlorococcus marinus MED4 l E. Kolker, BIATECH, Shewanella onedensis l J. Keasling, UC Berkeley, Desulfovibrio vulgaris Plasmodium falciparum, Stanford University l plasmocyc.stanford.edu Fiona Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa Methanococcus janaschii, EBI maine.ebi.ac.uk:1555

10 SRI International Bioinformatics EcoCyc Project – EcoCyc.org E. coli Encyclopedia l Model-Organism Database for E. coli l Computational symbolic theory of E. coli l Electronic review article for E. coli u 10,500 literature citations u 3600 protein comments l Tracks the evolving annotation of the E. coli genome l Resource for microbial genome annotation Collaborative development via Internet l John Ingraham (UC Davis) l Paulsen (TIGR) – Transport, flagella, DNA repair l Collado (UNAM) -- Regulation of gene expression l Keseler, Shearer (SRI) -- Metabolic pathways, cell division, proteases l Karp (SRI) -- Bioinformatics Nuc. Acids. Res. 33:D334 2005 ASM News 70:25 2004 Science 293:2040

11 SRI International Bioinformatics Comments in Proteins, Pathways, Operons, etc.

12 SRI International Bioinformatics EcoCyc Accelerates Science Experimentalists l E. coli experimentalists l Experimentalists working with other microbes l Analysis of expression data Computational biologists l Biological research using computational methods l Genome annotation l Study connectivity of E. coli metabolic network l Study organization of E. coli metabolic enzymes into structural protein families l Study phylogentic extent of metabolic pathways and enzymes in all domains of life Bioinformaticists l Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions, Metabolic engineers l “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “ Educators

13 SRI International Bioinformatics MetaCyc: Metabolic Encyclopedia Nonredundant metabolic pathway database Describe a representative sample of every experimentally determined metabolic pathway Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Jointly developed by SRI and Carnegie Institution Nucleic Acids Research 32:D438-442 2004

14 SRI International Bioinformatics MetaCyc Curation DB updates by 5 staff curators l Information gathered from biomedical literature l Emphasis on microbial and plant pathways l More prevalent pathways given higher priority l Curator’s Guide lists curation conventions Review-level database Four releases per year Quality assurance of data and software: l Evaluate database consistency constraints l Perform element balancing of reactions l Run other checking programs l Display every DB object

15 SRI International Bioinformatics MetaCyc Curation Ontologies guide querying l Pathways (recently revised), compounds, enzymatic reactions l Example: Coenzyme M biosynthesis Extensive citations and commentary Evidence codes l Controlled vocabulary of evidence types l Attach to pathways and enzymes: u Code : Citation : Curator : date Release notes explain recent updates l http://biocyc.org/metacyc/release-notes.shtml http://biocyc.org/metacyc/release-notes.shtml

16 SRI International Bioinformatics MetaCyc Data

17 SRI International Bioinformatics MetaCyc Pathway Variants Pathways that accomplish similar biochemical functions using different biochemical routes l Alanine biosynthesis I – E. coli l Alanine biosynthesis II – H. sapiens Pathways that accomplish similar biochemical functions using similar sets of reactions l Several variants of TCA Cycle

18 SRI International Bioinformatics MetaCyc Super-Pathways Groups of pathways linked by common substrates Example: Super-pathway containing l Chorismate biosynthesis l Tryptophan biosynthesis l Phenylalanine biosynthesis l Tyrosine biosynthesis Super-pathways defined by listing their component pathways Multiple levels of super-pathways can be defined Pathway layout algorithms accommodate super-pathways

19 SRI International Bioinformatics More Information 200+ pages of documentation available: User’s Guide, Schema Guide, Curator’s Guide Pathway Tools source code available Active community of contributors Read the release notes!

20 SRI International Bioinformatics Behind the Scenes 330,000 lines of code, mostly Common Lisp 4.5 programmers Extensive QA on each release Bug tracking using Bugzilla

21 SRI International Bioinformatics The Common Lisp Programming Environment Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000)

22 SRI International Bioinformatics Peter Norvig’s Solution “I wrote my version in Lisp. It took me about 2 hours (compared to a range of 2-8.5 hours for the other Lisp programmers in the study, 3-25 for C/C++ and 4-63 for Java) and I ended up with 45 non-comment non-blank lines (compared with a range of 51-182 for Lisp, and 107-614 for the other languages). (That means that some Java programmer was spending 13 lines and 84 minutes to provide the functionality of each line of my Lisp program.)” http://www.norvig.com/java-lisp.html

23 SRI International Bioinformatics Common Lisp Programming Environment General-purpose language, not just for recursive or functional programming Interpreted and/or compiled execution Fabulous debugging environment High-level language Interactive data exploration Extensive built-in libraries Dynamic redefinition Find out more! l See ALU.org or l http://www.international-lisp-conference.org/http://www.international-lisp-conference.org/

24 SRI International Bioinformatics Pathway Tools WWW Server

25 SRI International Bioinformatics Summary Pathway/Genome Databases l MetaCyc non-redundant DB of literature-derived pathways l 165 organism-specific PGDBs available through SRI at BioCyc.org l Computational theories of biochemical machinery Pathway Tools software l Extract pathways from genomes l Morph annotated genome into structured ontology l Distributed curation tools for MODs l Query, visualization, WWW publishing

26 SRI International Bioinformatics BioCyc and Pathway Tools Availability WWW BioCyc freely available to all l BioCyc.org BioCyc DBs freely available to non-profits l Flatfiles downloadable from BioCyc.org Pathway Tools freely available to non-profits l PC/Windows, PC/Linux, SUN

27 SRI International Bioinformatics Acknowledgements SRI l Suzanne Paley, Michelle Green, Ron Caspi, Ingrid Keseler, John Pick, Carol Fulcher, Markus Krummenacker, Alex Shearer EcoCyc Project Collaborators l Julio Collado-Vides, John Ingraham, Ian Paulsen MetaCyc Project Collaborators l Sue Rhee, Peifen Zhang, Hartmut Foerster And l Harley McAdams Funding sources: l NIH National Center for Research Resources l NIH National Institute of General Medical Sciences l NIH National Human Genome Research Institute l Department of Energy Microbial Cell Project l DARPA BioSpice, UPC BioCyc.org


Download ppt "Pathway Tools User Group Meeting Introduction Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org."

Similar presentations


Ads by Google