Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology www.gramene.org.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Slide-1 ONTOLOGY DEVELOPMENT AND INTEGRATION Tutorial exercise: A preview.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene Ontology John Pinney
POC tutorial#3: Annotation This tutorial will run automatically in Quicktime. To run the tutorial at your own pace use the internal controllers within.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
COG and GO tutorial.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Slide-1 DEVELOPMENT AND INTEGRATION OF ONTOLOGIES IN GRAMENE Scientific Advisory Board Meeting January 2005.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
1 Building Communities Around Ontology Development Pankaj Jaiswal Dept. of Plant Breeding and Genetics Cornell University Ithaca, NY FAO,
A Comparative Genomics Resource for Grains. Tutorial Tips If you are viewing this tutorial with Adobe Acrobat Reader, click the "bookmarks" on the left.
A Comparative Genomic Mapping Resource for Grains.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
A Comparative Genomic Mapping Resource for Grains.
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Annotating with GO: an overview
Introduction to the Gene Ontology
Genome Annotation Continued
Annotation: linking literature to gene products
Welcome to the Gene and Allele Database Tutorial
Welcome to the Protein Database Tutorial
Gramene’s Ontologies Tutorial
Presentation transcript:

Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology Rice Protein and Ontology Database

Objectives –Annotation of rice proteins using Gene Ontology (GO) concepts of Molecular Function, Biological Process and Cellular Localization 4,000 rice genes annotated during project Leading to presentation of Rice Protein Database (RPD) ( –Ontology Contribute GO terms for monocot plants Develop and curate vocabulary for plant anatomy developmental stages phenotypes or trait (TO-Trait Ontology) (PO-Plant Ontology)

Gene mining using the Controlled vocabulary Protein Morphology Anatomy Or Histology Cell Sub-Cellular Tissue Root Shoot Seed Meristematic Vascular Ground Cell components Pathways Reactions Other roles Enzyme others Localization Molecular Function Biological Process Molecule Traits (TO) Organ Cell type Transcript Gene Development Sub components Agronomic (PO) PO & TO GO Organic Inorganic Fats/carohydrates/proteins/mutagens/ others Internal CVO

Gene Ontology Molecular function Biological process Cellular localization Published report -PubMed -BIOSIS -Others Experimental evidence Direct enzyme assay Expression Mutant/phenotype Physical interaction Complementation Genetic interaction Localization Electronic-prediction Citation Sequence similarity Electronic Curation information Sequence similarity Clustal / BLAST Traceable author statement Predictions/identification Gen Ontology mapping Gramene & Interpro (EBI) Pfam PROSITE PROTOMAP Transmembrane helices Cellular localization Predictions based on HMM Physiochemical properties ProDom 3D-Structural alignments DBXref / References GenBank SWISSPROT EMBL/DDBJ Other databases Sequence entry Rice Protein database (RPD) EnsEMBL Genome Browser sequence IEA and ISS codes Non IEA code Link back Plant Ontology Anatomy & growth stages Non IEA code BLAT Features on Peptide map DBXrefs Germplasm bank Gramene Modules

Name(s): Shows all the different names by which the molecule is represented in various databases and in scientific literature. E.C. Number(s): Shows the designated Enzyme Commission (E.C.) number. The EC numbers link to the GenomeNet, Japan, from where further links to biochemical pathways and Ligands are accessible Gene name(s): Lists all the gene names by which the molecule is called, as designated by the Commission on Plant Gene Nomenclature. If not available consider using a systematic name given to the ORF/Gene. GenBank/SWISSPROT ENTRY Get information on Courtesy KEGG database Protein page

Accession number: Is the Swissprot accession number, also similar to the "AC" field from SWALL (EMBL) record and "ACCESSION" field of GenBank records for respective protein entry. Links the protein entry to the other databases namely, GenBank protein database, SWALL from EMBL and SWISS-PROT. GenBank/SWISSPROT ENTRY Get information on Organism: Represents the taxonomic information on the organism from which the protein sequence was derived. Species: Shows the species of the Genus Oryza (presently represents 23 of 25 species) Subspecies: The subspecies indica or the japonica of the rice species Oryza sativa. Cultivar: Is the variety/cultivar name from which the sequence was derived and will link to a germplasm bank (GRIN/IRIS) for further information Protein page

GenBank/SWISSPROT ENTRY Perform a “Blat” alignment of the Rice protein sequences from SWISSPROT and translated peptides from Ensembl Rice genome sequence database at Gramene. The cut-off score used is 99% identity. The curator should validate. Add the features to the Protein structure - a map showing protein domains (e.g. Pfam) and protein features (trans-membrane, low complexity and coil regions) on the Ensembl peptide report page. Sequence Use it for performing analyses to identify features such as, Pfam / Prosite domains and generate predictions for trans-membrane helix, coiled coil regions, cellular component localization Validation Based on available CDS features and gene indices/ESTs Map with features Protein page

Various tools used by Gramene in annotation of rice gene products ftp:// Pfam members in RPD Prosite members in RPD

Annotate rice gene function using the Gene Ontology (GO) system Provide literature citations as evidence for assertion and classify them using the evidence codes Rice Functional Information Gene Ontology is a controlled vocabulary to define the following concepts for a gene product Molecular function: GO term(s) defining the molecular function of gene product Biological process: GO term(s) defining the biological process Cellular component: GO term(s) identifying the localization of the protein in a cell After identifying a number of features, finally the curator proceeds to annotate gene product(s) in Rice Protein Database

Gene Ontology (GO) Associations IDA inferred from direct assay Enzyme assays / in vitro reconstitution immunofluorescence / cell fractionation binding assay IEA inferred from electronic annotation Feature search / Interpro / Pfam / Prosite / Annotations from database records IEP inferred from expression pattern Northerns / microarray data / western blots IMP inferred from mutant phenotype Gene mutation / deletion or disruption / over expression / ectopic expression anti-sense experiments / RNAi experiments / specific protein inhibitors NR not recorded Very old annotation IGI inferred from genetic interaction Suppressor screens / synthetic lethal / functional Complementation / rescue experiments IPI inferred from physical interaction 2-hybrid interactions/3-hybrid interactions co-purification / co-immunoprecipitation / affinity interaction ISS inferred from sequence or structural similarity Sequence similarity / Recognized domains / Structural similarity Southern blotting NAS non-traceable author statement No citation / non-traceable by curator TAS traceable author statement review article / text book / dictionary / website / database A complete list is available at EVIDENCE CODES APPLIED IN RICE PROTEIN DATABASE

The association of protein 1433_ORYSA with the GO term Gene Ontology (GO) Associations Protein page Gramene Ontology Database

The association of protein 1433_ORYSA with literature citation (EVIDENCE for molecular function) Gene Ontology (GO) Associations Gramene Literature Database Protein page

The association of protein 1433_ORYSA with the Literature citation and EVIDENCE CODES Gene Ontology (GO) Associations Protein page

Total number of associations: 9866 (3321 gene products associated with 781 GO terms) Biological Process: 242 terms-2881 associations Molecular Function: 449 term-5599 associations Cellular Component: 90 terms-1386 associations Total number of proteins: 8985 Number of proteins from SWISSPROT: 397 Number of proteins from TrEMBL: 8588 Total number of evidences: Total number of IEA evidences: Total number of non-IEA evidences: 577 Total number of references as evidences: 74 Biological process Molecular function Rice Protein Database (RPD) statistics-1 GO mappings are based on Interpro-EBI and Gramene curation

Total number of proteins in RPD: 8985 Number of proteins from SWISS-PROT: 397 Number of proteins from TrEMBL: 8588 Total number of correspondences between proteins and translations: 7960 (6912 proteins correspond to 7957 translations) Proteins have only one corresponding translation:5911 Proteins have two corresponding translations: 959 Proteins have three corresponding translations: 37 Proteins have four corresponding translations: 5 Gene products associated with 781 GO terms: 3321 (refer to previous slide) Number of Pfam entries: 874 Total number of proteins that have mappings to Pfam: 3663 Number of Prosite entries: 556 Total number of proteins that have mappings to Prosite: 3201 Total number of proteins that have mappings to trans-membrane features: Rice Protein Database (RPD) statistics-2

Trait Ontology (TO) to describe Mutants/phenotypes in rice

PLANT ONTOLOGY resources will be available soon

Future plans Continue annotation of rice proteins Identify the resources and tools to provide much improved annotation of rice proteins, using HMM’s, structure predictions and other tools. Develop tools to simplify the process of gene mining using Gramene and other databases by building combination search tools using controlled vocabulary and feature tables. Start building up a resource for creating a protein interaction map for the complete rice genome based on association in a biochemical pathway, assembly in a functional complex / interacting partners, proximity on the genome and common regulation mechanism (a possible collaboration). Contribute / share the controlled vocabulary for monocots with other databases Develop the necessary tools and host the resource pages for Plant Ontology Consortium Collaborate with Gene Ontology Consortium on various aspects of ontology development and curation