Emily Dimmer GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK Gene Ontology (GO)

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
COG and GO tutorial.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Internet tools for genomic analysis: part 2
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
An introduction to using the AmiGO Gene Ontology tool.
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Using The Gene Ontology: Gene Product Annotation.
Introduction to the Gene Ontology and GO annotation resources
Gene Ontology (GO) Project
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Ontologies, data standards and controlled vocabularies.
EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Gene expression analysis
EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database.
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Part II GO-Vocabulary of Genome. S. cerevisiae D. melanogaster.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Gene Ontology Consortium
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Gene Annotation & Gene Ontology
Introduction to the Gene Ontology
Department of Genetics • Stanford University School of Medicine
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Presentation transcript:

Emily Dimmer GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK Gene Ontology (GO)

Introduction to GO Description of the GO ontologies How groups annotate to GO Practical: Investigating the GO and OBO web sites Browsing the GO using the AmiGO Browser. Open Biomedical Ontologies How GO is being used Available Tools GO slims Practical: Creating your own GO slim GO Tutorial Outline:

Introduction to GO Description of the GO ontologies How groups annotate to GO Practical: Investigating the GO and OBO web sites Browsing the GO using the AmiGO Browser. Open Biomedical Ontologies How GO is being used Available Tools GO slims Practical: Creating your own GO slim GO Tutorial Outline:

Introduction to GO Description of the GO ontologies How groups annotate to GO Practical: Investigating the GO and OBO web sites Browsing the GO using the AmiGO Browser. Open Biomedical Ontologies How GO is being used Available Tools GO slims Practical: Creating your own GO slim GO Tutorial Outline:

Introduction to GO Description of the GO ontologies How groups annotate to GO Practical: Investigating the GO and OBO web sites Browsing the GO using the AmiGO Browser. Open Biomedical Ontologies How GO is being used Available Tools GO slims Practical: Creating your own GO slim GO Tutorial Outline:

Why is GO needed ? THE PROBLEM: Huge body of knowledge with an extremely large vocabulary to describe it Vocabulary used is poorly defined –i.e. one word can have different meanings –or different names for the same concept Biological systems are complex and our knowledge of such systems is incomplete RESULT: Large databases which are difficult to manage and impossible to mine computationally

A (part of the) solution: GO: “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” What is GO?

Access gene product functional information Provide a link between biological knowledge and … gene expression profiles proteomics data Find how much of a proteome is involved in a process/ function/ component in the cell using a GO-Slim (a slimmed down version of GO to summarize biological attributes of a proteome) Map GO terms and incorporate manual GOA annotation into own databases to enhance your dataset or to validate automated ways of deriving information about gene function (text-mining). What can scientists do with GO?

Tactition Tactile sense Taction ?

perception of touch ; GO: Tactition Tactile sense Taction

Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies

Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies

Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies

Molecular Function: elemental activity or task e.g. DNA binding, catalysis of a reaction Biological Process: broad objective or goal e.g. mitosis, signal transduction, metabolism Cellular Component: location or complex e.g. nucleus, ribosome GO Three (Orthogonal) Ontologies

How does GO work? Provides a standard, species-neutral way of representing biology GO covers ‘normal’ functions and processes –No pathological processes –No experimental conditions

Molecular Function 7,493 terms Biological Process 9,640 terms Cellular Component 1,634 terms Total 18,767 terms Definitions: 16,696 (93.9 %) Content of GO

What is GO? NOT a system of nomenclature or a list of gene products GO doesn’t attempt to cover all aspects of biology or evolutionary relationships Open Biomedical Ontologies NOT a dictated standard NOT a way to unify databases

Reactome

Anatomy of a GO term GO terms are composed of: Term name Unique GO ID Definition (93 % of GO terms are defined) Synonyms (optional) Database references (optional) Relationships to other GO terms

Ontologies “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing” (Gruber 1993) I. The GO Ontologies

Can be used to: Formalise the representation of biological knowledge Describe a common and defined vocabulary for database annotation Standardise database submissions Provide unified access to information through ontology-based querying of databases, both human and computational Improve management and integration of data within databases. Facilitate data mining Ontology applications

Ontologies can be represented as graphs, where the vertices (nodes and leaves) are connected by edges. The nodes are concepts in the ontology. The edges are the relationships between the concepts node edge Ontology Structure

The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG). Terms are linked by two relationships –is-a –part-of Terms can have more than one parent

Simple hierarchies Directed Acyclic (Trees) Graphs

Directed Acyclic Graph cell membrane chloroplast mitochondrial chloroplast membrane is-a part-of

True Path Rule The path from a child term all the way up to its top-level parent(s) must always be true cell  cytoplasm  chromosome  nuclear chromosome  nucleus  nuclear chromosome is-a  part-of 

Terms become obsolete when they are removed or redefined GO IDs are never deleted For each term, a comment is added to explains why the term is now obsolete Ensuring Stability in a Dynamic Ontology Obsolete Cellular Component Obsolete Molecular Function Obsolete Biological Process Biological Process Molecular Function Cellular Component

Access to the Gene Ontology Downloads formats available: OBO GO XMLOWL MySQL ( Web-based tools AmiGO ( QuickGO (

II. Annotating to GO Use of GO terms to represent the activities and localizations of gene products. Basic information needed: 1. Database object (e.g. a protein or gene identifier) e.g. Q9ARH1 2. Reference ID e.g. PubMed ID: GO term ID e.g. GO: Evidence code e.g. TAS

GenNav:

J. Clark et al. Plant Physiology 2005 (in press)

Two types of GO Annotation:  Electronic Annotation  Manual Annotation All annotations must: be attributed to a source. indicate what evidence was found to support the GO term-gene/protein association.

Electronic Annotation Provides large-coverage High-quality BUT annotations tend to use high-level GO terms and provide little detail.

1.Assignment of GO terms to gene products using existing information within database entries Manual mapping of GO terms to concepts external to GO (‘translation tables’). Proteins then electronically annotated with the relevant GO term(s). 2.Automatic sequence analyses to transfer annotations between highly similar gene products Electronic Annotation

Fatty acid biosynthesis ( Swiss-Prot Keyword) EC: (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit ( InterPro entry) MF_00527: Putative 3- methyladenine DNA glycosylase (HAMAP) GO:Fatty acid biosynthesis ( GO: ) GO:acetyl-CoA carboxylase activity ( GO: ) GO:acetyl-CoA carboxylase activity (GO: ) GO:DNA repair (GO: ) Electronic Annotation

Mappings of external concepts to GO

Evaluation of precision of annotation electronic techniques (InterPro2GO, SPKW2GO, EC2GO) Compared manually-curated test set of GO annotated proteins with the electronic annotations InterPro2GO = most coverage EC2GO = 67 % of predictions exactly match the manual GO annotation % of time the 3 mappings predicted GO terms within the same lineage Camon et al. BMC Bioinformatics 2005 in press

Manual Annotation High–quality, specific gene/gene product associations made, using: Peer-reviewed papers Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists

Finding GO terms In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Process: response to wounding GO: serine/threonine kinase activity, Function: protein serine/threonine kinase activity GO: integral membrane protein Component: integral to plasma membrane GO: …for B. napus PERK1 protein (Q9ARH1) PubMed ID: wound response

GO Evidence Codes *With column required Manually annotated CodeDefinition *IEAInferred from Electronic Annotation IDAInferred from Direct Assay IEPInferred from Expression Pattern *IGIInferred from Genetic Interaction IMPInferred from Mutant Phenotype *IPIInferred from Physical Interaction *ISSInferred from Sequence Similarity TASTraceable Author Statement NASNon-traceable Author Statement *ICInferred from Curator RCAInferred from Reviewed Computational Analysis NDNo Data IDA: Enzyme assays In vitro reconstitution (transcription) Immunofluorescence Cell fractionation TAS: In the literature source the original experiments referred to are traceable (referenced).

GO Evidence Codes *With column required Manually annotated additional needed identifier for annotations using certain evidence codes CodeDefinition *IEAInferred from Electronic Annotation IDAInferred from Direct Assay IEPInferred from Expression Pattern *IGIInferred from Genetic Interaction IMPInferred from Mutant Phenotype *IPIInferred from Physical Interaction *ISSInferred from Sequence Similarity TASTraceable Author Statement NASNon-traceable Author Statement *ICInferred from Curator RCAInferred from Reviewed Computational Analysis NDNo Data IGI: a gene identifier for the "other" gene involved in the interaction IPI: a gene or protein identifier for the "other" protein involved in the interaction IC: GO term from another annotation used as the basis of a curator inference

Annotation of a gene product to one ontology is independent from its annotation to other ontologies. Terms reflecting a normal activity or location are only annotated to. Usage of ‘unknown’ GO terms (e.g. Molecular function unknown GO: ) …some extra things:

A set of ‘Qualifier’ terms is also available to curators modify the interpretation of an annotation. Allowable values: 1. NOT a gene product is not associated with the GO term to document conflicting claims in the literature. 2. Contributes to distinguishes between individual subunits functions and whole complex functions (used with GO Function Ontology) 3. Colocalizes with Transiently or peripherally associated with an organelle or complex where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things: Qualifier Information

The Qualifier column can be used to modify the interpretation of an annotation. Allowable values: 1. NOT a gene product is not associated with the GO term to document conflicting claims in the literature. 2. Contributes to distinguishes between individual subunits functions and whole complex functions (used with GO Function Ontology) 3. Colocalizes with Transiently or peripherally associated with an organelle or complex where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things:

The Qualifier column can be used to modify the interpretation of an annotation. Allowable values: 1. NOT a gene product is not associated with the GO term to document conflicting claims in the literature. 2. Contributes to distinguishes between individual subunits functions and whole complex functions (used with GO Function Ontology) 3. Colocalizes with Transiently or peripherally associated with an organelle or complex where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things:

The Qualifier column can be used to modify the interpretation of an annotation. Allowable values: 1. NOT a gene product is not associated with the GO term to document conflicting claims in the literature. 2. Contributes to distinguishes between individual subunit functions and whole complex functions (used with GO Function Ontology) 3. Colocalizes with Transiently or peripherally associated with an organelle or complex where the resolution of an assay is not accurate. (used with GO Component Ontology) …some extra things:

Accessing annotations to the Gene Ontology 1. Downloads Annotations – gene association files Ontologies and annotations – MySQL and XML 2. Web-based access AmiGO ( QuickGO ( …among others…

Gene Association File Calcyclin IPI protein taxon: UniProt Calcyclin IPI protein taxon: UniProt UniProtP06703S106_HUMAN GO: GOA:spkw IEA F UniProtP06703 S106_HUMAN NOT GO: PMID: NAS P UniProtP06703 S106_HUMAN GO: PMID: IPI UniProt:P50995 F via web (GO consortium page) DB DB_Object_ID DB_Object_Symbol Qualifier GOid DB:Reference Evidence With Aspect DB_Object_Name DB_Object_Synonym DB_Object_Type taxon Date Assigned by

Summary GO is still being developed and updated - it requires a serious and ongoing effort. –the biological community is involved New model organism databases are joining the GO Consortium annotation effort

Practical session 1.Visit the GO website 2.Visit the OBO website 3.Browse the ontologies using the official GO Consortium Browser – AmiGO

GO web site: Part 1.

OBO web site:

AmiGO:

GO terms with no children

Filter queries by organism, data source or evidence Search for GO terms or by Gene symbol/name Querying the GO

GOst tool

QuickGO browser:

OBO and Gene Ontology Uses and Tools

Anatomy Physiology Phenotype Pathway Disease Molecular Metabolic Developmental Stage Ontologies

Beyond GO – Open Biomedical Ontologies Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions Anatomies Cell Types Sequence Attributes Temporal Attributes Phenotypes Diseases More….

Sequence Ontology

Ontology of ‘small molecular entities’

Access to GO and its annotations

How to access the Gene ontology and its annotations 1. Downloads Ontologies – (various – GO, OBO, XML, OWL MySQL) Annotations – gene association files Ontologies and Annotations – MySQL and XML 2. Web-based access AmiGO ( QuickGO ( among others…

SRS view…

Access gene product functional information Provide a link between biological knowledge and … gene expression profiles proteomics data Find how much of a proteome is involved in a process/ function/ component in the cell using a GO-Slim (a slimmed down version of GO to summarize biological attributes of a proteome) Map GO terms and incorporate manual GOA annotation into own databases to enhance your dataset or to validate automated ways of deriving information about gene function (text-mining). What can scientists do with GO?

attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. …analysis of high-throughput data according to GO MicroArray data analysis

Proteomics data analysis Kislinger T et al, Mol Cell Proteomics, 2003 GO classification …analysis of high-throughput data according to GO

Analysis of Data: Clustering

Color indicates up/down regulation GoMiner Tool, John Weinstein et al, Genome Biol. 4 (R28) 2003

Compare annotations associated with the test set to the entire set of GO annotations…. DNA Repair seems to be a common theme. Example of VLAD Output

…overview proteome with GO Slim

map2slim.pl distributed as part of the go-perl package maps a set of annotations up to their parent GO slim terms Off-the-shelf GO slims

Summary  The Gene Ontology project precipitated a generalized implementation for ontologies for molecular biology  Bio-ontologies such as GO have facilitated development of systems for hypothesis generation in biological systems  Further integration – creation of cross-products between different ontologies

Practical II – Creation of GO slims using the DAG-Edit tool.

…loading the GO

ftp://ftp.geneontology.org/pub/go/ontology/gene_ontology.obo …loading the GO

…browsing the GO

…viewing GO terms

…searching for GO terms

…creating a new GO slim

…creating a renderer for the GO slim

…adding terms to the GO slim

…filtering GO for terms in the GO slim

…removing filters/renderers

…saving the newly created GO slim