Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.

Slides:



Advertisements
Similar presentations
Bienvenidos al PMN! Kate Dreher curator PMN/TAIR.
Advertisements

Kate Dreher curator TAIR/PMN Department of Plant Biology
Extracting information from scientific papers: Challenges and Opportunities for Researchers and Curators DPB.
How pathway databases were created and curated Peifen Zhang Plant Metabolic Network (PMN)
Annotation of Gene Function …and how thats useful to you.
TAIR: Bringing together data for the global plant biology community kate dreher curator TAIR/PMN.
The Arabidopsis Information Resource (TAIR)
Arabidopsis as a model for plant development Eva Huala.
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
El PMN: Tu amigo en el metabolismo de plantas Kate Dreher curator PMN/AraCyc/TAIR.
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
SRI International Bioinformatics Comparative Analysis Q
Overview of the Pathway Tools Software and Pathway/Genome Databases.
The Plant Metabolic Network: PlantCyc, AraCyc, and NEW Metabolic Pathway Databases for Plant Research *K. Dreher, P. Zhang, L. Chae, R.A. Nilo Poyanco,
Overview of the Pathway Tools Software and Pathway/Genome Databases.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Introduction to the Plant Metabolic Network: 18 Databases and Omics-Level Tools for Analysis and Discovery kate dreher The Carnegie Institution for Science.
Using Pathway-tools for phenotype- directed curation Jeremy Zucker Broad Institute of MIT and Harvard Boston University.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Introduction to the Pathway Tools Software David Walsh and Simon Eng bigDATA Workshop—May 29, 2010.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Accessing the Data You Need at the Plant Metabolic Network kate dreher biocurator PMN The Carnegie Institution for Science Stanford, CA.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
1 SRI International Bioinformatics The Pathway Tools Software and BioCyc Database Collection Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
Kate dreher biocurator / plant molecular biologist The Carnegie Institution for Science Stanford, CA Introduction to the Plant Metabolic Network: Data.
Data Content of the BioCyc Databases. BioCyc Tier 1 Databases.
New data and tools at TAIR (The Arabidopsis Information Resource)
Bioinformatics Dr. Víctor Treviño BT4007
Copyright OpenHelix. No use or reproduction without express written consent1.
Accessing information in plant metabolic pathway databases at the PMN, Gramene, and SGN Part I: Contents, Search Strategies, and Data Sharing Opportunities.
TAIR/Gramene/SGN Workshop I ASPB Meeting July 08, 2007 Chicago, IL Metabolic Databases.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
SRI International Bioinformatics 1 Object Groups & Enrichment Analysis Suzanne Paley Pathway Tools Workshop 2010.
PlantCyc, AraCyc, PoplarCyc and more... Building databases and connecting to researchers at the Plant Metabolic Network kate dreher curator PMN/TAIR.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
Combining Computational Prediction and Manual Curation to Create Plant Metabolic Pathway Databases Peifen Zhang Carnegie Institution For Science Department.
Metabolic Pathway Databases and Tools Speaker and Schedule Update PMN (Peifen Zhang) KEGG (auto-slide show) MetaCrop (cancelled)
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
PlantCyc, AraCyc, PoplarCyc and more... Building databases with YOUR help at the Plant Metabolic Network kate dreher curator PMN/TAIR.
Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
1 AraCyc Metabolic Pathway Annotation. 2 AraCyc – An overview  AraCyc is a metabolic pathway database for Arabidopsis thaliana;  Computational prediction.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
An Advanced Web Query Interface for Biological Databases
The Pathway Tools FBA Module
The Pathway Tools Software and BioCyc Database Collection
Overview of Microbial Pathway and Genome Databases
Advanced PGDB Editing: Gene Ontology (GO) Terms
Welcome to Gramene’s RiceCyc (Pathways) Tutorial
Part II SeqViewer AraCyc Help
Overview of the Pathway Tools Software and Pathway/Genome Databases
Presentation transcript:

Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science

 AraCyc – Arabidopsis Metabolic EnCyclopedia Database of metabolic pathways found in Arabidopsis AraCyc  Accessible from : TAIR – The Arabidopsis Information Resource 

 AraCyc – Arabidopsis Metabolic EnCyclopedia Database of metabolic pathways found in Arabidopsis AraCyc  Accessible from : PMN – Plant Metabolic Network 

AraCyc Pathway pages Pathway Enzyme Gene Reaction Compound Evidence Code + Additional curated information

AraCyc Pathway pages Classification Summary References Superpathways Pathway variants

AraCyc Pathway pages Pathway Enzyme Gene Reaction Compound Evidence Code

AraCyc Pathway pages Pathway Enzyme Gene Reaction Compound Evidence Code

AraCyc Compound pages AraCyc Compound: CDP-choline Synonyms Appears as Product Classification(s) Molecular Weight / Formula Appears as Reactant

AraCyc Pathway pages Pathway Enzyme Gene Reaction Compound Evidence Code

AraCyc Enzyme detail pages AraCyc Enzyme: phosphatidyltransferase Multifunctional protein * *

AraCyc Enzyme detail pages AraCyc Enzyme: phosphatidyltransferase Reaction Pathway(s) Summary References Inhibitors, Kinetic Parameters, etc.

AraCyc Pathway pages Pathway Enzyme Gene Reaction Compound Evidence Code To TAIR...

AraCyc 4.5 (released June 2008) Pathways288 Compounds1956 Reactions1723 Citations2279  More detailed information available in the Release Notes

PlantCyc 1.0 (released June 2008) Pathways508 Compounds2314 Reactions2277 Citations4208 Species292 

Putting AraCyc (and PlantCyc) to use  Reference information Pathways, Genes, Enzymes, Reactions, and Metabolites  Data Analysis (AraCyc) Use the OMICS viewer  Display the results of experiments on an Arabidopsis metabolic map  Study your data or public data sets

Putting AraCyc to use Compounds Transcripts or Proteins  Display the results of experiments on an Arabidopsis metabolic map

Putting AraCyc (and PlantCyc) to use  Reference information Pathways, Genes, Enzymes, Reactions, and Metabolites  Data Analysis (AraCyc) Use the OMICS viewer  Display the results of experiments on an Arabidopsis metabolic map  Study your data or public data sets Generate new hypotheses  Find metabolic differences in your mutant with “no phenotype”  Identify pathways that are related to your favorite biological process See more at “Advanced Bioinformatic Resources for Arabidopsis”  Thursday, July 24, 7 PM in the Grand Salon  Enzyme discovery Fill “pathway holes” through comparative analyses

Putting AraCyc (and PlantCyc) to use Pathway “Hole Filling” ?????? Choline Biosynthesis I AraCyc Spinach Soybean ethanolamine PlantCyc Fill pathway “hole”

Curators Metabolic Pathway Databases Data sources and data flow Computational predictions Community submissions Research Community Experimental Data Genes, Proteins, Metabolites Published literature Data repositories

Data sources and data flow  Information enters metabolic pathway database in two stages Stage 1: Initial build Stage 2: Updates and improvements  AraCyc 1.0 – Initial Build

Initial AraCyc Build (2002)  7900 Arabidopsis genes annotated to the GO term ‘catalytic activity’  4900 loci in small molecule metabolism 19% of the total genome  Goal: Map these loci to metabolic PATHWAYS  Solution: Use reference database: MetaCyc (460 metabolic pathways) Run PathoLogic program (SRI International) Predict metabolic pathways present in Arabidopsis

MetaCyc  Multi-kingdom metabolic pathway database METAbolic EnCYClopedia SRI International (  First released in 1999  All pathways generated by curators extracting information from the scientific literature  Only contains pathways with experimental support  Reference database Used to create SINGLE SPECIES databases... including AraCyc in 2002!

Initial AraCyc Build (2002) PathoLogic ANNOTATED GENOME AT1G69370 chorismate mutase prephenate aminotransferase arogenate dehydratase chorismateprephenateL-arogenateL-phenylalanine Gene calls Gene functions DNA sequences AT1G69370 chorismate mutase MetaCyc AraCyc AT2G27820 arogenate dehydratase

PathoLogic Program  Matches input enzymes to reference enzymes Name Enzyme Commission (EC) number  Identifies probable pathways Enzyme coverage Predicted species distribution  Initial AraCyc 1.0 build (2002) PathoLogic inferred over 200 pathways PathoLogic mapped 940 genes to the pathways

Validation of a New Database  PathoLogic errs on the side of over-prediction  Curators validate pathways...

Validation of a New Database  Curators Find support for predicted pathways  Is the pathway described in Arabidopsis literature?  Are the crucial metabolites described in Arabidopsis literature?  Does the pathway include a unique reaction catalyzed by an Arabidopsis protein?

Validation of a New Database  Curators: Remove pathways not found in Arabidopsis  glycogen biosynthesis  C4 photosynthesis  caffeine biosynthesis Edit pathways operating via a different route  Phenylalanine biosynthesis in bacteria vs. Arabidopsis

Validation of a New Database AraCyc Pathway: phenylalanine biosynthesis  Edit pathways operating via a different route

Completion of a New Database  Curators Add Arabidopsis pathways not present in reference database Add Arabidopsis compounds, reactions, and enzymes not mapped to a pathway Assign evidence codes to pathways and enzymes

Assignment of Evidence Codes

AraCyc and beyond  Information enters metabolic pathway database in two stages Stage 1: Initial build Stage 2: Updates and improvement

Database updates and improvements ReleaseAraCyc 1.0AraCyc 4.5AraCyc 5.0 Pathways219288even more!

Database updates and improvements  New rounds of computational pathway prediction New TAIR genome releases New MetaCyc releases  New round of PathoLogic prediction

Database updates and improvements  New rounds of computational pathway prediction New TAIR genome releases New reference database – PlantCyc  Part of the Plant Metabolic Network  Released in June 2008  Contains plant pathways supported by:  experimental evidence  expert hypothesis ***  Reviewed by an editorial board of biochemists  Will include enzymes from newly sequenced plant genomes and EST collections

Database updates and improvements  New rounds of computational pathway prediction Newly predicted pathways undergo pathway validation PathoLogic Program Updated pathway predictions for AraCyc Newest TAIR Genome Annotations Newest Version of PlantCyc See poster: ICAR1404

Database updates and improvements  New curator entries Curators search for new information in scientific literature TAIR curators  Assign new functional annotations to metabolic genes AraCyc curators  Manually attach enzymes to pathways  Identify new and updated pathways  Write or revise summaries

Database updates and improvements  New community submissions Jamborees  Experts meet individually with curators  Review pathways in specific metabolic domains  Provide useful references and suggest important pathways Curation Booth ******  Open during all poster sessions – Booth #1  Please come (free candy!) TAIR or PMN website

Community submissions  TAIR –

Community submissions  TAIR –

Community submissions  PMN –

Community submissions  PMN –

Community submissions  PMN Contributor page Your name here! = fame!

Acknowledgements Current Curators: - Peifen Zhang (Director and lead curator- metabolism) - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - A. S. Karthikeyan (curator) - Donghui Li (curator) Recent Past Curators: - Christophe Tissier (curator) - Hartmut Foerster (curator) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks Metabolic Pathway Software: - Peter Karp and SRI group (NIH) TAIR, AraCyc, and the PMN Eva Huala (Director and Co-PI) Sue Rhee (PI and Co-PI)

Thank you Please visit us at the Curation Booth!

Curation workflow identify a pathway find details of reactions find details of enzymes data entry structure of substrates enzymes EC number kinetic parameters inhibitors / activators coding gene reactions draw pathway diagram

Database maintenance and improvement Single Species Databases Multi-species reference database AraCyc 4.5 RiceCyc PoplarCyc PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++ Refine existing databases *PlantCyc* AraCyc 5.0 RiceCyc PoplarCyc

Database maintenance and improvement Single Species Databases Multi-species reference database PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++ AraCyc 5.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc*

Database maintenance and improvement Single Species Databases Multi-species reference database PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++ AraCyc 10.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc*

Database maintenance and improvement Single Species Databases Multi-species reference database AraCyc 4.5 RiceCyc PoplarCyc PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++ Refine existing databases *PlantCyc* AraCyc 5.0 RiceCyc PoplarCyc

Database maintenance and improvement Single Species Databases Multi-species reference database PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++ AraCyc 5.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc*

Database maintenance and improvement Single Species Databases Multi-species reference database PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++ AraCyc 10.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc*

Database maintenance and improvement Build NEW databases Single Species Databases Multi-species reference database AraCyc 4.5 RiceCyc PoplarCyc PlantCyc Genome AnnotationPathoLogic PredictionManual Pathway Curation ++