Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

1 Gene Ontology and Functional Annotation Donghui Li ASPB Plant Biology, June 29, 2008, Merida.
Annotation of Gene Function …and how thats useful to you.
TAIR: Bringing together data for the global plant biology community kate dreher curator TAIR/PMN.
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Issues in Managing and Disseminating Changing Information in Biology Sue Rhee Carnegie Institution Department of Plant Biology.
Gene Ontology John Pinney
POC tutorial#3: Annotation This tutorial will run automatically in Quicktime. To run the tutorial at your own pace use the internal controllers within.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Internet tools for genomic analysis: part 2
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
Using The Gene Ontology: Gene Product Annotation.
Gene Ontology (GO) Project
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
PLEXdb Plant Expression database Ethalinda Cannon Iowa State University January 15th, 2007.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
1 Building Communities Around Ontology Development Pankaj Jaiswal Dept. of Plant Breeding and Genetics Cornell University Ithaca, NY FAO,
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
The Plant Ontology Consortium website: Contact Information for deliverables Lincoln Stein,
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Supplemental Fig. S1 extracellular (P=0.000) cell wall (P=0.000) ribosome (P=0.001) ER (P=0.294) golgi apparatus (P=0.005) plasma membrane (P=0.000) mitochondria.
A Comparative Genomic Mapping Resource for Grains.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
Community Interactions: Feedback, Support and Curation Eva Huala The Arabidopsis Information Resource (TAIR)
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
The Plant Ontology Consortium Lincoln Stein 1, Susan McCouch 2, Elizabeth Kellogg 3, Seung Rhee 4, Pankaj Jaiswal 2, Doreen Ware 1, Peter Stevens 5 1 Cold.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
PubSearch Danny Yoo, Iris Xu, Behzad Mahini Pub* Tools Website: Literature Curaotors’ Website:
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Gene Ontology Consortium
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Oct.27, 2003 Curator Meeting, Oct Gene Expression Curation ~WormBase, 2003 ~
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Plant structure and growth stage ontologies to describe phenotypes and gene expression in angiosperms Pankaj Jaiswal Cornell University.
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
1 st The Arabidopsis Information Resource (TAIR) Workshop for Database/Web Resource Developers (those currently developing or want to develop or interested.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
GO : the Gene Ontology & Functional enrichment analysis
Welcome to the Gene and Allele Database Tutorial
Gramene’s Ontologies Tutorial
Presentation transcript:

Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology

TAIR A model organism database for Arabidopsis thaliana Current major data types: community (~11,000 people, ~4,000 labs) literature (~12,000 articles, ~450 reviews) genes and proteins (~29,000 genes, ~28000 proteins) alleles and polymorphisms (~150,000) germplasms (~150,000, ~1000 mutant, ~800 ecotypes ) ‘expert’ gene families (~450 containing ~4000 genes) microarray data (~130 experiments, ~600 hybridizations) metabolic pathways (~170 pathways, ~1000 reactions)

Controlled Vocabularies Existing: GO function, process, component Arabidopsis anatomy, developmental stages Under development: experimental methods environmental factors PO anatomy, developmental stages Planned: PO trait Needed: chemical values? (qualitative and quantitative)

Developing controlled vocabs Anatomy Developmental Stages Methodology Using controlled vocabs Gene and gene product functional annotation Community Microarray experiments and array elements Alleles Germplasm

Purpose for Anatomy and Developmental Stages Ontologies To describe things like: where is a gene expressed in the plant at what stage of development was the plant when the RNA sample taken from what tissues was the protein sample derived what part(s) of the plant are affected in a mutant line

Anatomy and Developmental Stages anatomical parts (295) developmental stages (69) Sources Katherine Esau (1960) The Anatomy of Seed Plants John Bowman (1994) Arabidopsis An Atlas of Morphology and Development Meyerowitz & Somerville ed. (1994) Arabidopsis An Atlas of Morphology and Development Numerous primary articles and websites on development/anatomy Stanley Letovsky, Cereon Genomics Doug Boyes, Paradigm Genetics Leonore Reiser (TAIR), Jonathan Clarke (JIC)

Rules for Anatomy and Developmental Stages Ontology Development Terms from literature and text books that describe anatomy and development (364 terms; 221 defs). For anatomy- the terms must describe parts that are found in Arabidopsis (limited scope). Developmental stages should be based on morphological features- regardless of a time component as different accessions reach the same stage at different times. An example is the floral developmental stages defined by John Bowman.

Created separate anatomy and developmental stages as directed acyclic graphs. Tried to make the graphs orthogonal in order to generate cross products easily. Creating cross products between stages and anatomy (what parts exist at what stage?) Creating cross products with developmental process terms. Used DAGEditor (BDGP) for ontologies and eventually making cross products. Methods for Anatomy and Developmental Stages Ontology Development

Ontologies for Anatomy and Development

Crossing Anatomy and Developmental Stsages etc.etc.etc…

Current Status 221/364 terms are defined. Terms (definitions and relationships) are checked for accuracy (external review and literature). Being used to annotate genes and products Files available on GOBO and TAIR ftp sites In collaboration with MaizeDB and Gramene on sharing the ontologies to build a common plant ontology (probably flowering plant…)

Developing Methods Ontology (An ontology of experimental techniques) Sources: short, semi-controlled description of experimental information during annotation (102) protocols from the research community (152) microarray experiments (129) Current status: DAG structure 195 terms and 3 definitions more structure revision needed Leonore Reiser, Margarita Garcia-Hernandez, Gabe Lander

Developing controlled vocabs Anatomy Developmental Stages Methodology Using controlled vocabs Gene and gene product functional annotation Community Microarray experiments and array elements Alleles Germplasm

Currently Annotated Data Types Genes and gene products (2931) –molecular function (2599 genes to 296 terms) –biological process (536 genes to 269 terms) –subcellular location (695 genes to 104 terms) –Anatomy & devel. Stages (117 genes to 50 terms) –spatial and temporal expression pattern (110 genes to 52 terms) Community (2415 comm. to 2892 terms) –research interest (2737 terms) –organism of interest (192 terms)

Basic Process of Literature Curation Subject termObject term Paper Binds to Involved in Functions as Expressed in Is subunit of Related to Required for Located in Interacts with Regulates More… data object (e.g. gene) controlled vocabulary data object automatic manual Currently 20 types

Pubsearch Statistics (10/02) Data typesunique matched (unique papers) Genes3182 (7485), 8.9% GO process terms653 (8439), 9.5% GO function terms830 (6537), 15.2% GO component terms266 (5686), 23.1% Anatomy/develop terms213 (5583), 58.5% | development | 2531 | | growth | 1900 | | transcription | 1114 | | biosynthesis | 865 | | flowering | 697 | | transduction | 684 | | transport | 659 | | signal transduction | 625 | | germination | 455 | | metabolism | 425 | | binding | 1494 | | enzyme | 1184 | | kinase | 637 | | receptor | 433 | | beta-glucuronidase | 413 | | protein kinase | 309 | | hormone | 302 | | DNA binding | 299 | | transcription factor | 269 | | transporter | 230 | | cell | 2487 | | membrane | 1031 | | chromosome | 842 | | chloroplast | 604 | | plasma membrane | 408 | | cell wall | 291 | | plastid | 270 | | nucleus | 258 | | intracellular | 236 | | host | 230 | TOP TEN LISTS

PubSearch Annotation User Interface

Expanding Data Objects to Annotate microarray experiments RNA samples array elements (ESTs, oligos, PCR products) alleles (natural variant & mutant forms) germplasm (ecotypes & mutant lines)

Microarray Data Annotation Experiments goals (e.g. GO process) variables (e.g. anatomy, environment, chemical) need a qualifier (e.g. values ontology?) type/category (e.g. methods) RNA Samples germplasm biomaterial (e.g. anatomy, devel. stages) external conditions (e.g. methods, envirnoment, chemical) Array elements affected by/in XXX (e.g. GO process, anat, dev) induced during/in XXX reduced during/in XXX

Phenotype Annotation Trait Condition environment chemical Methodology TraitValue Germplasm hy4-1 mutant linelong height measure with ruler light ConditionValue absence Anatomy Biol. Process etc… hypocotyl

Expanding Types of Annotations By using more relationship types rather than more terms in an ontology. For example: Gene to gene family –Relationship type: is a member of Molecular interactions –Relationship types: represses, activates, binds to Genetic interactions –Relationship types: suppresses, enhances

A model of control of flowering in Arabidopsis From “Molecular Genetics of Plant Development”

Generating the image from the database Gene1RelationshipGene2 ELF3RepressesGA1 ELF3RepressesSPY ELF3ActivatesEMF1 Represses = Activates = ELF3GA1 ELF3GA1, SPY ELF3GA1, SPY EMF1

Genetic Interaction / Transcriptional Regulation Pathways

Acknowledgements Leonore Reiser Tanya Berardini Suparna Mundodi Margarita Garcia-Hernandez Eva Huala Lukas Mueller Peifen Zhang Aisling Doyle, J. Yoon, Gabe Lander Danny Yoo, Iris Xu Jonathan Clarke (John Innes Institute) GO, TIGR, Monsanto, MaizeDB, Gramene, SRI International

Where to get our stuff ontologies and annotations (ftp site) ftp://ftp.arabidopsis.org/home/tair/Ontologies/ annotations (search & download ) literature curation software-pubsearch (download)

Sources of Vocabularies Literature primary research articles (~12000) textbooks (~10) protocols (~150) web sites and databases (~50) Community individual database submission (e.g. research interest) collaboration (e.g. JIC, MaizeDB, Gramene) bulk contribution (e.g. Monsanto)