Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology www.gramene.org.

Similar presentations


Presentation on theme: "Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology www.gramene.org."— Presentation transcript:

1 Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology www.gramene.org Rice Protein and Ontology Database

2 Objectives –Annotation of rice proteins using Gene Ontology (GO) concepts of Molecular Function, Biological Process and Cellular Localization 4,000 rice genes annotated during project Leading to presentation of Rice Protein Database (RPD) (http://www.gramene.org/perl/protein_search) –Ontology Contribute GO terms for monocot plants Develop and curate vocabulary for plant anatomy developmental stages phenotypes or trait (TO-Trait Ontology) www.gramene.org (PO-Plant Ontology)

3 Gene mining using the Controlled vocabulary Protein Morphology Anatomy Or Histology Cell Sub-Cellular Tissue Root Shoot Seed Meristematic Vascular Ground Cell components Pathways Reactions Other roles Enzyme others Localization Molecular Function Biological Process Molecule Traits (TO) Organ Cell type Transcript Gene Development Sub components Agronomic (PO) PO & TO GO Organic Inorganic Fats/carohydrates/proteins/mutagens/ others Internal CVO www.gramene.org

4 Gene Ontology Molecular function Biological process Cellular localization Published report -PubMed -BIOSIS -Others Experimental evidence Direct enzyme assay Expression Mutant/phenotype Physical interaction Complementation Genetic interaction Localization Electronic-prediction Citation Sequence similarity Electronic Curation information Sequence similarity Clustal / BLAST Traceable author statement Predictions/identification Gen Ontology mapping Gramene & Interpro (EBI) Pfam PROSITE PROTOMAP Transmembrane helices Cellular localization Predictions based on HMM Physiochemical properties ProDom 3D-Structural alignments DBXref / References GenBank SWISSPROT EMBL/DDBJ Other databases Sequence entry Rice Protein database (RPD) EnsEMBL Genome Browser sequence IEA and ISS codes Non IEA code Link back Plant Ontology Anatomy & growth stages Non IEA code BLAT Features on Peptide map DBXrefs Germplasm bank Gramene Modules www.gramene.org

5 Name(s): Shows all the different names by which the molecule is represented in various databases and in scientific literature. E.C. Number(s): Shows the designated Enzyme Commission (E.C.) number. The EC numbers link to the GenomeNet, Japan, from where further links to biochemical pathways and Ligands are accessible Gene name(s): Lists all the gene names by which the molecule is called, as designated by the Commission on Plant Gene Nomenclature. If not available consider using a systematic name given to the ORF/Gene. GenBank/SWISSPROT ENTRY Get information on Courtesy KEGG database www.gramene.org Protein page

6 Accession number: Is the Swissprot accession number, also similar to the "AC" field from SWALL (EMBL) record and "ACCESSION" field of GenBank records for respective protein entry. Links the protein entry to the other databases namely, GenBank protein database, SWALL from EMBL and SWISS-PROT. GenBank/SWISSPROT ENTRY Get information on Organism: Represents the taxonomic information on the organism from which the protein sequence was derived. Species: Shows the species of the Genus Oryza (presently represents 23 of 25 species) Subspecies: The subspecies indica or the japonica of the rice species Oryza sativa. Cultivar: Is the variety/cultivar name from which the sequence was derived and will link to a germplasm bank (GRIN/IRIS) for further information www.gramene.org Protein page

7 GenBank/SWISSPROT ENTRY Perform a “Blat” alignment of the Rice protein sequences from SWISSPROT and translated peptides from Ensembl Rice genome sequence database at Gramene. The cut-off score used is 99% identity. The curator should validate. Add the features to the Protein structure - a map showing protein domains (e.g. Pfam) and protein features (trans-membrane, low complexity and coil regions) on the Ensembl peptide report page. Sequence Use it for performing analyses to identify features such as, Pfam / Prosite domains and generate predictions for trans-membrane helix, coiled coil regions, cellular component localization Validation Based on available CDS features and gene indices/ESTs www.gramene.org Map with features Protein page

8 Various tools used by Gramene in annotation of rice gene products ftp://www.gramene.org/pub/gramene/protein/feature/Oryza_TMHMM_result.txt Pfam members in RPD Prosite members in RPD www.gramene.org

9 Annotate rice gene function using the Gene Ontology (GO) system Provide literature citations as evidence for assertion and classify them using the evidence codes www.gramene.org Rice Functional Information Gene Ontology is a controlled vocabulary to define the following concepts for a gene product Molecular function: GO term(s) defining the molecular function of gene product Biological process: GO term(s) defining the biological process Cellular component: GO term(s) identifying the localization of the protein in a cell After identifying a number of features, finally the curator proceeds to annotate gene product(s) in Rice Protein Database

10 Gene Ontology (GO) Associations IDA inferred from direct assay Enzyme assays / in vitro reconstitution immunofluorescence / cell fractionation binding assay IEA inferred from electronic annotation Feature search / Interpro / Pfam / Prosite / Annotations from database records IEP inferred from expression pattern Northerns / microarray data / western blots IMP inferred from mutant phenotype Gene mutation / deletion or disruption / over expression / ectopic expression anti-sense experiments / RNAi experiments / specific protein inhibitors NR not recorded Very old annotation IGI inferred from genetic interaction Suppressor screens / synthetic lethal / functional Complementation / rescue experiments IPI inferred from physical interaction 2-hybrid interactions/3-hybrid interactions co-purification / co-immunoprecipitation / affinity interaction ISS inferred from sequence or structural similarity Sequence similarity / Recognized domains / Structural similarity Southern blotting NAS non-traceable author statement No citation / non-traceable by curator TAS traceable author statement review article / text book / dictionary / website / database A complete list is available at http://www.gramene.org/plant_ontology/evidence_codes.html EVIDENCE CODES APPLIED IN RICE PROTEIN DATABASE www.gramene.org

11 The association of protein 1433_ORYSA with the GO term Gene Ontology (GO) Associations www.gramene.org Protein page Gramene Ontology Database

12 The association of protein 1433_ORYSA with literature citation (EVIDENCE for molecular function) www.gramene.org Gene Ontology (GO) Associations Gramene Literature Database Protein page

13 The association of protein 1433_ORYSA with the Literature citation and EVIDENCE CODES Gene Ontology (GO) Associations www.gramene.org Protein page

14 Total number of associations: 9866 (3321 gene products associated with 781 GO terms) Biological Process: 242 terms-2881 associations Molecular Function: 449 term-5599 associations Cellular Component: 90 terms-1386 associations Total number of proteins: 8985 Number of proteins from SWISSPROT: 397 Number of proteins from TrEMBL: 8588 Total number of evidences: 21170 Total number of IEA evidences: 20593 Total number of non-IEA evidences: 577 Total number of references as evidences: 74 Biological process Molecular function Rice Protein Database (RPD) statistics-1 www.gramene.org GO mappings are based on Interpro-EBI and Gramene curation

15 Total number of proteins in RPD: 8985 Number of proteins from SWISS-PROT: 397 Number of proteins from TrEMBL: 8588 Total number of correspondences between proteins and translations: 7960 (6912 proteins correspond to 7957 translations) Proteins have only one corresponding translation:5911 Proteins have two corresponding translations: 959 Proteins have three corresponding translations: 37 Proteins have four corresponding translations: 5 Gene products associated with 781 GO terms: 3321 (refer to previous slide) Number of Pfam entries: 874 Total number of proteins that have mappings to Pfam: 3663 Number of Prosite entries: 556 Total number of proteins that have mappings to Prosite: 3201 Total number of proteins that have mappings to trans-membrane features: 1583 www.gramene.org Rice Protein Database (RPD) statistics-2

16 Trait Ontology (TO) to describe Mutants/phenotypes in rice www.gramene.org

17 www.plantontology.org PLANT ONTOLOGY resources will be available soon www.gramene.org

18 Future plans www.gramene.org Continue annotation of rice proteins Identify the resources and tools to provide much improved annotation of rice proteins, using HMM’s, structure predictions and other tools. Develop tools to simplify the process of gene mining using Gramene and other databases by building combination search tools using controlled vocabulary and feature tables. Start building up a resource for creating a protein interaction map for the complete rice genome based on association in a biochemical pathway, assembly in a functional complex / interacting partners, proximity on the genome and common regulation mechanism (a possible collaboration). Contribute / share the controlled vocabulary for monocots with other databases Develop the necessary tools and host the resource pages for Plant Ontology Consortium Collaborate with Gene Ontology Consortium on various aspects of ontology development and curation


Download ppt "Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology www.gramene.org."

Similar presentations


Ads by Google