Annotating Gene Products to the GO

Slides:



Advertisements
Similar presentations
1 Gene Ontology and Functional Annotation Donghui Li ASPB Plant Biology, June 29, 2008, Merida.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
COG and GO tutorial.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Protein and Function Databases
BICH CACAO Biocurator Training Session #3.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
CACAO training part 1 Jim Hu and Suzi Aleksander For UW Parkside Fall 2014.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Introduction to GO Annotation Eurie Hong (SGD), Michelle Gwinn (TIGR), Tanya Berardini (TAIR), Karen Pilcher (DictyBase), Russell Collins (FlyBase), Carol.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Ontologies, data standards and controlled vocabularies.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Gene Ontology TM (GO) Consortium
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Gene Annotation & Gene Ontology
CACAO Training ASM-JGI 2012.
Annotating with GO: an overview
D D ? ? D D ? ?Check sequences TAIR? E. coli? ? ISS IDA, IMP, IGI, IC
GO : the Gene Ontology & Functional enrichment analysis
Introduction to the Gene Ontology
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Translation 2.7 & 7.3.
Annotation: linking literature to gene products
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics Bar Harbor, ME http://www.geneontology.org/GO.annotation.html

What is an annotation? GO Term An annotation is a statement that a gene product … …has a particular molecular function …is involved in a particular biological process …is located within a certain cellular component …as determined by a particular method …as described in a particular reference. Evidence Code Reference Smith et al. determined by a direct assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.

Anatomy of an annotation Object (previously mentioned) GO Term from most recent GO GO Term Qualifier (optional) NOT, Co_localizes with, or Contributes_to Evidence Code : IDA, IPI, IMP, IEP, IGI, ISS, IEA, TAS, NAS, or IC Evidence Code Qualifier (required for some codes) Used in combination with IPI, IMP, IGI, and ISS Seq_ID or DB_ID required. Reference: literature or database specific reference DB_ID or PMID

Getting the GO http://www.godatabase.org http://www.ebi.ac.uk/ego http://www.informatics.jax.org/searches/GO_form.html http://www.ebi.ac.uk/ego

GO Evidence Codes Code Definition IEA Inferred from Electronic Annotation NAS Non-traceable Author Statement TAS Traceable Author Statement ND No Data Use with annotation to unknown IDA Inferred from Direct Assay *IPI Inferred from Physical Interaction *IGI Inferred from Genetic Interaction IMP Inferred from Mutant Phenotype IEP Inferred from Expression Pattern *IC Inferred from Curator *ISS Inferred from Sequence Similarity Manually annotated

Annotation Strategies Electronic (IEA) Good for first pass Usually based on some sort of sequence comparisons (but use ISS if paper based) IP2GO (InterPro to GO SPTR2GO (SwissProt to GO) Manual (literature)

Literature selection A paper is selected for GO curation of a mouse gene product if: A paper gives direct experimental evidence for the normal function, process, or cellular location of a mouse* gene product (IDA, IMP, IGG, IPI). A paper gives direct experimental evidence for the normal function, process, or cellular location of a non-mouse gene product AND the paper presents homology data to a mouse gene product (ISS) The two main keywords: DIRECT and NORMAL Or organism of your choice!!!

Annotation process READ the full papers! Abstracts alone can be very misleading Quite often, the species are not specified. Sometimes a paper uses human, mouse and rat interchangeably , or uses human for one gene and mouse for a different one. I highly caution against using abstracts as the sole source of GO.

Example Annotations

Abstract suggests that this paper demonstrates that Ibtk Note also that the actually species used for any assays is NOT specifically mentioned! Abstract suggests that this paper demonstrates that Ibtk Binds to a protein kinase Inhibits kinase activity Inhibits calcium mobolization Inhibits transcription

Evidence used for process and function Use most specific term possible Both IDA Same piece of data (IDA) demonstrates that this gene product inhibit protein kinase activity and thus is involved in the negative regulation of protein amino acid phosphorylation

Both Btk and iBtk have protein binding activity to each other, IPI evidence code

IDA evidence code This paper demonstrates that expression of the protein down-regulates calcium mobilization. Using the term GO:00051209 is misleading, because it is not known if this protein DIRECTLY effects this process. It would be safer to use a term for negative regulation of …., which doesn’t exist. The curator should request that the term be created and re-visit the annotation.

Abstract totally misses the sub-cellular localization!!!

The Gene Association File Sharing Annotations The Gene Association File

Annotation Sharing Amigo Browser: http://www.godatabase.org A GO browser that tracks contributed GO annotations across species. Uses annotation sets supplied in a specific format.

The Gene Association files 15 column tab delimited text file

Anatomy of a gene association file Column Content Example 1 DB SGD, MGI 2 DB_Object ID MGI:1234568 3 DB_Object_Symbol Gras3 4 GO_ID Qualifier NOT, co_localizes_with, contributes_to 5 GO_ID GO:0001515 6 DB_Ref PMID:234567 7 Evidence_Code IDA, etc. 8 With/From 9 GO_aspect P (process), C (component) F (function) 10 DB_Object_Name Grasshopper 3 homlog 11 DB_Object_Synonym Locust III, 0122345E12Rik 12 DB_Object_Type Gene, transcript, or protein 13 Taxon taxon:4932 14 Date 20050101 15 Assigned_by DB (usually same as column 1)

Some Special Cases

Annotate to finest granularity Annotating to GO:0030047 automatically annotates to all of its parents; thus a product is annotated to both protein modification AND cytoskeleton organization

GO Does not annotate substrates A gene product that has protein kinase activity is also involved in the process of protein phosphorylation The protein that gets phosphorylated is NOT involved in the process of protein phosphorylation.

Qualifiers GO Term Qualifiers Evidence Code Qualifiers “NOT” Can be used with any term “contributes_to” Used for molecular function “co_localizes with” Used with cellular component Evidence Code Qualifiers Sequence ID (for ISS) Protein ID (for IPI and protein binding) Mutant ID (for IMP) Gene (for IGI) GO ID (for IC)

The “not” GO Term Qualifier 'NOT' is used to make an explicit note that the gene product is not associated with the GO term. This is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). e.g. This protein does not have ‘kinase activity’ because the Author states that this protein has a disrupted/missing an ‘ATP binding’ domain. Also used to document conflicting claims in the literature. NOT can be used with ALL three GO Ontologies.

The ‘contributes_to’ qualifier Contributes_to: An individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the complex. This practice is colloquially known as annotating 'to the potential of the complex‘. This qualifer allows us to distinguish the individual subunit from complex functions e.g. contributes_to ribosome binding when part of a complex but does not perform this function on its own. All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. Only used with GO Function Ontology contributes_to: An individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the complex. This practice is colloquially known as annotating 'to the potential of the complex', and is a way to capture information about what a complex does in the absence of database objects and identifiers representing complexes. Annotating individual gene products according to attributes of a complex is especially useful for molecular function annotations in cases where a complex has an activity, but not all of the individual subunits do. (For example, there may be a known catalytic subunit and one or more additional subunits, or the activity may only be present when the complex is assembled.) Molecular function annotations of complex subunits that are not known to possess the activity of the complex must include the entry 'contributes_to' in the 'Qualifier' column (see below). All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. Note that 'contributes_to' is not needed to annotate a catalytic subunit. Furthermore, 'contributes_to' may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not. Examples: Subunits of nuclear RNA polymerases: none of the individual subunits have RNA polymerase activity, yet all of these subunits are annotated to 'DNA-dependent RNA polymerase activity' (with the 'contributes_to' note), to capture the activity of the complex. Another example is ATP citrate lyase (ACL) in Arabidopsis: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity. eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to 'GTP binding' and one to 'RNA binding' without qualifiers, and all three are annotated to 'ribosome binding', with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex. The Qualifier documentation: http://www.geneontology.org/GO.annotation.html

GO:0005515 Protein Binding Used to annotate a gene product as being able to bind another protein If the target protein is known, then use the IPI evidence code and the UniProt identifier in the “with” field. If the target is not known, then use the IDA evidence code. The gene product being annotated does not have to be a protein itself: eg: Rpph1, ribonuclease P RNA component H1, has protein binding activity (GO:0005515)

ISS:Inferred from sequence similarity Author states Orthology Used by MGI curators A direct experiment must have been performed in the non-mouse organism If the sequence comparison and the experiment are in one paper,then the reference is the paper If the orthology is MGI curated, then the reference is J:73065. The experimental paper reference goes in a note. MGI curates Orthology

IMP:Inferred from mutant phenotype Mostly used in inferring function from knock-out mice Uses the WITH field. Abnormal branching of submandibular gland

Inferred from Curator (IC) Used where an annotation is not supported by any evidence, but can be reasonably inferred by a curator from other GO annotations, for which evidence is available. The ‘with’ field is required, and is populated by a GO id using the same reference Example: Ref. 1 shows that a gene product has chloride channel activity (GO:0005254:) by direct assay (IDA). A curator can then add the component annotation ‘integral to membrane’ (GO:0016021) using the IC evidence code and put GO:0005254 in the “with” field. Some databases are doing IC based on ISS annotation?????? Should GOA allow this? Often we find it hard to find evidence to fill in the Component Ontology but we may know that ‘DNA binding’ occurs in the ‘nucleus’ or that ‘cytokines’ are secreted and therefore act ‘extracellular’. The IC code has been included in the GO evidence codes to allow us to use our curator judgement to annotate to GO ontologies base on a GO term we do have evidence for. Other examples: If known protein possesses kinase activity, then could annotate the term ATP binding with IC code. If known to bind DNA, then could annotate ‘nucleus’ With IC code. Caution: The IC evidence code should not be used for something obvious. For example, if a gene product is being annotated to the function “protein kinase activity” (GO:0004672) by IDA, then it is also involved in the process “protein amino acid phosphorylation” (GO:0006468) by the same experiment (IDA).

Unknown v.s. Unannotated GO has three terms to be used when the curator has determined that there is no existing literature to support an annotation. Biological_process unknown GO:0000004 Molecular_function unknown GO:0005554 Cellular_component unknown GO:0008372 These are NOT the same as having no annotation at all. No annotation means that no one has looked yet.

http://www.geneontology.org/GO.annotation.html