Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
COG and GO tutorial.
CACAO Biocurator Training CACAO Fall CACAO Syllabus What is CACAO & why is it important? Training Examples.
BICH CACAO Biocurator Training Session #3.
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
GO Enrichment analysis COST Functional Modeling Workshop April, Helsinki.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
AgBase: bioinformatics enabling knowledge generation from agricultural omics data Fiona McCarthy.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Managing Data Modeling GO Workshop 3-6 August 2010.
Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.
Strategies for functional modeling TAMU GO Workshop 17 May 2010.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
AgBase Shane Burgess, Fiona McCarthy Mississippi State University.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Prioritization of Avian GO Annotation , , Chicken ,06949,5163.4Rat ,69664, Mouse ,83036, Human.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Gene Annotation & Gene Ontology
Getting GO annotation for your dataset
CACAO Training ASM-JGI 2012.
Annotating with GO: an overview
Strategies for functional modeling
Introduction to the Gene Ontology
Workshop Aims TAMU GO Workshop 17 May 2010.
Functional Annotation of the Horse Genome
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
ID Mapping tools: Converting Accessions between Databases
GO Annotation from different sources
A User’s Guide to GO: Structural and Functional Annotation
Ensembl Genome Repository.
Gene expression analysis
Annotating Gene Products to the GO
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009

All workshop materials are available at AgBase.

Genomic Annotation Genome annotation is the process of attaching biological information to genomic sequences. It consists of two main steps: 1. identifying functional elements in the genome: “structural annotation” 2. attaching biological information to these elements: “functional annotation” biologists often use the term “annotation” when they are referring only to structural annotation

CHICK_OLF6 DNA annotation Protein annotation Data from Ensembl Genome browser TRAF 1, 2 and 3TRAF 1 and 2 Structural annotation:

catenin Functional annotation:

Structural & Functional Annotation Structural Annotation: Open reading frames (ORFs) predicted during genome assembly predicted ORFs require experimental confirmation the Sequence Ontology (SO) provides a structured controlled vocabulary for sequence annotation Functional Annotation: annotation of gene products = Gene Ontology (GO) annotation initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid) functional literature exists for many genes/proteins prior to genome sequencing GO annotation does not rely on a completed genome sequence!

1. Provides structural annotation for agriculturally important genomes 2. Provides functional annotation (GO) 3. Provides tools for functional modeling 4. Provides bioinformatics & modeling support for research community

Introduction to GO 1. pre-GO: managing large datasets 2. Bio-ontologies 3. the Gene Ontology (GO)  a GO annotation example  GO evidence codes  literature biocuration & computation analysis  ND vs no GO  sources of GO

1. pre-GO: managing large datasets

AgBase User Support Functional modeling training Database ID mapping  approx. 75% of requests Providing GO annotation for datasets/arrays Assistance with GO modeling tools Intermediary with between research community and public databases  NCBI, UniProtKB, GO Consortium Computational assistance

Converting database accessions UniProt database Ensembl BioMart Online analysis tools DAVID, g:profiler, etc AgBase database ArrayIDer tool More information about these tools is available from the online workshop resources.

1. UniProt ID Mapping

2. Ensembl BioMart NOTE: Ensembl is scheduled to add plant & microbe species in 2009.

3. Online analysis tools g:profiler conversion tool This tool works for all species found in Ensembl.

3. Online analysis tools Database for Annotation, Visualization and Integrated Discovery (DAVID) This tool works for a wide range of species.

Contact AgBase to request additional species. 4. AgBase: ArrayIDer

2. Bio-ontologies

Bio-ontologies Bio-ontologies are used to capture biological information in a way that can be read by both humans and computers.  necessary for high-throughput “omics” datasets  allows data sharing across databases Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined. The ontology shows how the objects relate to each other.

Bio-ontologies:

Ontologies digital identifier (computers) description (humans) relationships between terms

3. The Gene Ontology

Functional Annotation Gene Ontology (GO) is the de facto method for functional annotation Widely used for functional genomics (high throughput) Many tools available for gene expression analysis using GO The GO Consortium homepage:

GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO: fatty acid biosynthetic process TAS GO: mitochondrial electron transport, NADH to ubiquinone TAS GO: lipid biosynthetic process IEA Cellular Component (CC or C) GO: mitochondrial matrix IDA GO: mitochondrial respiratory chain complex I IDA GO: mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO: fatty acid binding IDA GO: NADH dehydrogenase (ubiquinone) activity TAS GO: oxidoreductase activity TAS GO: acyl carrier activity IEA

GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO: fatty acid biosynthetic process TAS GO: mitochondrial electron transport, NADH to ubiquinone TAS GO: lipid biosynthetic process IEA Cellular Component (CC or C) GO: mitochondrial matrix IDA GO: mitochondrial respiratory chain complex I IDA GO: mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO: fatty acid binding IDA GO: NADH dehydrogenase (ubiquinone) activity TAS GO: oxidoreductase activity TAS GO: acyl carrier activity IEA aspect or ontology GO:ID (unique) GO term name GO evidence code

GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO: fatty acid biosynthetic process TAS GO: mitochondrial electron transport, NADH to ubiquinone TAS GO: lipid biosynthetic process IEA Cellular Component (CC or C) GO: mitochondrial matrix IDA GO: mitochondrial respiratory chain complex I IDA GO: mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO: fatty acid binding IDA GO: NADH dehydrogenase (ubiquinone) activity TAS GO: oxidoreductase activity TAS GO: acyl carrier activity IEA GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

GO Mapping Example NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model Biocuration of literature detailed function “depth” slower (manual)

P05147 PMID: Find a paper about the protein. Biocuration of Literature: detailed gene function

Read paper to get experimental evidence of function Use most specific term possible experiment assayed kinase activity: use IDA evidence code

GO Mapping Example NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model Biocuration of literature detailed function “depth” slower (manual) Sequence analysis rapid (computational) “breadth” of coverage less detailed

Computational GO annotation (“breadth”) Ranjit Kumar existing GO annotations ga file accessions with no ISO ISO PIPELINE accessions from your species (species 1) public orthology prediction tool(s) 1:1 orthologs transfer GO annotation to your species (ISO) IEA PIPELINE fasta file of sequences (aa or nt) InterPro analysis (domains/motifs) GO2InterPro mapping file domains/motifs in sequence assign GO (IEA) no GO: “ND” ga file (integrate output into one ga file)

Unknown Function vs No GO ND – no data  Biocurators have tried to add GO but there is no functional data available  Previously: “process_unknown”, “function_unknown”, “component_unknown”  Now: “biological process”, “molecular function”, “cellular component” No annotations (including no “ND”): biocurators have not annotated

1. Primary sources of GO: from the GO Consortium (GOC) & GOC members  most up to date  most comprehensive 2. Secondary sources: other resources that use GO provided by GOC members  public databases (eg. NCBI, UniProtKB)  genome browsers (eg. Ensembl)  array vendors (eg. Affymetrix)  GO expression analysis tools

Different tools and databases display the GO annotations differently. Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated.

EXAMPLES:  public databases (eg. NCBI, UniProtKB)  genome browsers (eg. Ensembl)  array vendors (eg. Affymetrix) CONSIDERATIONS:  What is the original source?  When was it last updated?  Are evidence codes displayed? Secondary Sources of GO annotation

For more information about GO GO Evidence Codes: gene association file information: tools that use the GO: GO Consortium wiki: All websites are available from the workshop website & handout.