Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

1 Gene Ontology and Functional Annotation Donghui Li ASPB Plant Biology, June 29, 2008, Merida.
Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
COG and GO tutorial.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Protein and Function Databases
CACAO - Penn State Gene Function and Gene Ontology January 2011
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
CACAO training part 1 Jim Hu and Suzi Aleksander For UW Parkside Fall 2014.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Ontologies, data standards and controlled vocabularies.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL.
Gene expression analysis
EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database.
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Emily Dimmer GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK Gene Ontology (GO)
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Part II GO-Vocabulary of Genome. S. cerevisiae D. melanogaster.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
CACAO Training ASM-JGI 2012.
Annotating with GO: an overview
Introduction to the Gene Ontology
Gene expression analysis
Annotating Gene Products to the GO
Browsing the GO at MGI Harold Drabkin, Ph.D. Senior Scientific Curator
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics Bar Harbor, ME

What is an annotation? An annotation is a statement that a gene product … …has a particular molecular function …is involved in a particular biological process …is located within a certain cellular component …as determined by a particular method …as described in a particular reference. Evidence Code Evidence Code GO Term GO Term Reference Smith et al. determined by a direct assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.

Anatomy of an annotation Object (previously mentioned) GO Term from most recent GO –GO Term Qualifier (optional) NOT, Co_localizes with, or Contributes_to Evidence Code : IDA, IPI, IMP, IEP, IGI, ISS, IEA, TAS, NAS, or IC –Evidence Code Qualifier (required for some codes ) Used in combination with IPI, IMP, IGI, and ISS –Seq_ID or DB_ID required. Reference: literature or database specific reference –DB_ID or PMID

Getting the GO

GO Evidence Codes CodeDefinition IEAInferred from Electronic Annotation NASNon-traceable Author Statement TASTraceable Author Statement NDNo Data Use with annotation to unknown IDAInferred from Direct Assay *IPIInferred from Physical Interaction *IGIInferred from Genetic Interaction IMPInferred from Mutant Phenotype IEPInferred from Expression Pattern *ICInferred from Curator *ISSInferred from Sequence Similarity Manually annotated

Annotation Strategies Electronic (IEA) –Good for first pass Usually based on some sort of sequence comparisons (but use ISS if paper based) –IP2GO (InterPro to GO –SPTR2GO (SwissProt to GO) Manual (literature)

Literature selection A paper is selected for GO curation of a mouse gene product if: –A paper gives direct experimental evidence for the normal function, process, or cellular location of a mouse* gene product (IDA, IMP, IGG, IPI). –A paper gives direct experimental evidence for the normal function, process, or cellular location of a non-mouse gene product AND the paper presents homology data to a mouse gene product (ISS)

Annotation process READ the full papers! –Abstracts alone can be very misleading Quite often, the species are not specified. Sometimes a paper uses human, mouse and rat interchangeably, or uses human for one gene and mouse for a different one.

Example Annotations

Abstract suggests that this paper demonstrates that Ibtk –Binds to a protein kinase –Inhibits kinase activity –Inhibits calcium mobolization –Inhibits transcription

Evidence used for process and function Use most specific term possible Both IDA

Both Btk and iBtk have protein binding activity to each other, IPI evidence code

IDA evidence code

Abstract totally misses the sub-cellular localization!!!

Sharing Annotations The Gene Association File

Annotation Sharing Amigo Browser: –A GO browser that tracks contributed GO annotations across species. –Uses annotation sets supplied in a specific format.

The Gene Association files 15 column tab delimited text file

Anatomy of a gene association file ColumnContentExample 1DBSGD, MGI 2DB_Object IDMGI: DB_Object_SymbolGras3 4GO_ID QualifierNOT, co_localizes_with, contributes_to 5GO_IDGO: DB_RefPMID: Evidence_CodeIDA, etc. 8With/From 9GO_aspectP (process), C (component) F (function) 10DB_Object_NameGrasshopper 3 homlog 11DB_Object_SynonymLocust III, E12Rik 12DB_Object_TypeGene, transcript, or protein 13Taxon taxon: Date Assigned_byDB (usually same as column 1)

Some Special Cases

Annotate to finest granularity Annotating to GO: automatically annotates to all of its parents; thus a product is annotated to both protein modification AND cytoskeleton organization

GO Does not annotate substrates A gene product that has protein kinase activity is also involved in the process of protein phosphorylation The protein that gets phosphorylated is NOT involved in the process of protein phosphorylation.

Qualifiers GO Term Qualifiers –“NOT” Can be used with any term –“contributes_to” Used for molecular function –“co_localizes with” Used with cellular component Evidence Code Qualifiers –Sequence ID (for ISS) –Protein ID (for IPI and protein binding) –Mutant ID (for IMP) –Gene (for IGI) –GO ID (for IC)

'NOT' is used to make an explicit note that the gene product is not associated with the GO term. This is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). e.g. This protein does not have ‘kinase activity’ because the Author states that this protein has a disrupted/missing an ‘ATP binding’ domain. Also used to document conflicting claims in the literature. NOT can be used with ALL three GO Ontologies. The “not” GO Term Qualifier

The ‘contributes_to’ qualifier The Qualifier documentation: Contributes_to: An individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the complex. This practice is colloquially known as annotating 'to the potential of the complex‘. This qualifer allows us to distinguish the individual subunit from complex functions e.g. contributes_to ribosome binding when part of a complex but does not perform this function on its own. All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. Only used with GO Function Ontology

GO: Protein Binding Used to annotate a gene product as being able to bind another protein –If the target protein is known, then use the IPI evidence code and the UniProt identifier in the “with” field. –If the target is not known, then use the IDA evidence code. The gene product being annotated does not have to be a protein itself: eg: Rpph1, ribonuclease P RNA component H1, has protein binding activity (GO: )

ISS:Inferred from sequence similarity Used by MGI curators –A direct experiment must have been performed in the non-mouse organism If the sequence comparison and the experiment are in one paper,then the reference is the paper If the orthology is MGI curated, then the reference is J: The experimental paper reference goes in a note. Author states Orthology MGI curates Orthology

IMP:Inferred from mutant phenotype Mostly used in inferring function from knock-out mice Uses the WITH field. Abnormal branching of submandibular gland

Used where an annotation is not supported by any evidence, but can be reasonably inferred by a curator from other GO annotations, for which evidence is available. The ‘with’ field is required, and is populated by a GO id using the same reference Example: Ref. 1 shows that a gene product has chloride channel activity (GO: :) by direct assay (IDA). A curator can then add the component annotation ‘integral to membrane’ (GO: ) using the IC evidence code and put GO: in the “with” field. Caution: The IC evidence code should not be used for something obvious. For example, if a gene product is being annotated to the function “protein kinase activity” (GO: ) by IDA, then it is also involved in the process “protein amino acid phosphorylation” (GO: ) by the same experiment (IDA). Inferred from Curator (IC)

Unknown v.s. Unannotated GO has three terms to be used when the curator has determined that there is no existing literature to support an annotation. –Biological_process unknown GO: –Molecular_function unknown GO: –Cellular_component unknown GO: These are NOT the same as having no annotation at all. –No annotation means that no one has looked yet.