Download presentation
Presentation is loading. Please wait.
Published byPeter Arron Lawrence Modified over 9 years ago
1
EBI is an Outstation of the European Molecular Biology Laboratory. EBI Bioinformatics Roadshow 13 th June 2012 Rotterdam, Netherlands Duncan Legge Introduction to the Gene Ontology and GO Annotation Resources
2
OUTLINE OF TUTORIAL: PART I: Ontologies and the Gene Ontology (GO) PART II: GO Annotations How to access GO annotations How scientists use GO annotations
3
PART I: Gene Ontology
4
What does an ontology provide? 1. Consistent terminology – controlled vocabulary. 2. Relationships between terms – hierarchy.
5
Controlled vocabulary Q: What is a cell? A: It really depends who you ask!
6
Different things can be described by the same name
7
Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis The same thing can be described by different names:
8
Inconsistency in naming of biological concepts Same name for different concepts Different names for the same concept Comparison is difficult – in particular across species or across databases Just one reason why the Gene Ontology (GO) is is needed…
9
Why do we need GO? Large datasets need to be interpreted quickly Inconsistency in naming of biological concepts Increasing amounts of biological data available Increasing amounts of biological data to come
10
Increasing amounts of biological data available Search on mesoderm development…. you get 9441 results! Expansion of sequence information
11
What is an ontology? Dictionary: A branch of metaphysics concerned with the nature and relations of being (philosophy) A formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts (computer science) Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. 1606 1700s
12
What is an ontology? More usefully: An ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.
13
What’s in an Ontology?
14
What is the Gene Ontology (GO)? A way to capture biological knowledge in a written and computable form Describes attributes of gene products (RNA and protein)
15
http://www.geneontology.org Reactome E. Coli hub
16
The scope of GO What information might we want to capture about a gene product? What does the gene product do? Where does it act? How does it act?
17
Biological Process what does a gene product do? cell division transcription A commonly recognised series of events
18
Cellular Component where is a gene product located? plasma membrane mitochondrion mitochondrial membrane mitochondrial matrix mitochondrial lumen ribosome large ribosomal subunit small ribosomal subunit
19
Molecular Function how does a gene product act? insulin binding insulin receptor activity glucose-6-phosphate isomerase activity
20
Three separate ontologies or one large one? GO was originally three completely independent hierarchies, with no relationships between them As of 2009, GO have started making relationships between biological process and molecular function in the live ontology
21
Function Process art of s a
22
GO IS: species independent covers normal processes GO is NOT: NO pathological/disease processes NO experimental conditions NO evolutionary relationships NOT a nomenclature system
23
Aims of the GO project Edit the ontologies Annotate gene products using ontology terms Provide a public resource of data and tools
24
Anatomy of a GO term Unique identifier Term name Definition Synonyms Cross- references
25
node edge Ontology structure Nodes = terms in the ontology Edges = relationships between the concepts GO is structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children node Terms are linked by reationships, which add to the meaning of the term Less specific More specific
26
Relationships between GO terms is_a part_of regulates positively regulates negatively regulates has_part
27
is_a If A is a B, then A is a subtype of B mitotic cell cycle is a cell cycle lyase activity is a catalytic activity. Transitive relationship: can infer up the graph
28
part_of Necessarily part of Wherever B exists, it is as part of A. But not all B is part of A. Transitive relationship (can infer up the graph) B A
29
regulates One process directly affects another process or quality Necessarily regulates: if both A and B are present, B always regulates A, but A may not always be regulated by B B A
30
Relationships are upside down compared to is_a and part_of Necessarily has part has_part GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011
31
is_a complete For all terms in the ontology, you have to be able to reach the root through a complete path of is_a relationships: we call this being is_a complete important for reasoning over the ontology, and ontology development
32
True path rule Child terms inherit the meaning of all their parent terms.
33
How is GO maintained? GO editors and annotators work with experts to remodel specific areas of the ontology Signaling Kidney development Transcription Pathogenesis Cell cycle Deal with requests from the community database curators, researchers, software developers Some simple requests can be dealt with automatically GO Consortium meetings for large changes Mailing lists, conference calls, content workshops
34
Requesting changes to the ontology Public Source Forge (SF) tracker for term related issues https://sourceforge.net/projects/geneontology/
35
Why modify the GO? GO reflects current knowledge of biology Information from new organisms can make existing terms and arrangements incorrect Not everything perfect from the outset Improving definitions Adding in synonyms and extra relationships
36
Searching for GO terms http://www.ebi.ac.uk/QuickGO/ http://amigo.geneontology.org … there are more browsers available on the GO Tools page: http://www.geneontology.org/GO.tools.browsers.shtml The latest OBO Gene Ontology file can be downloaded from: http://www.geneontology.org/ontology/gene_ontology.obo
37
Exercise Browsing the Gene Ontology using QuickGO Exercise 1 15 mins
38
PART II: GO Annotation
39
A GO annotation is… A statement that a gene product: 1. has a particular molecular function Or is involved in a particular biological process Or is located within a certain cellular component 2. as determined by a particular evidence 3. as described in a particular reference
40
Evidence codes IDA: enzyme assay IPI: e.g. Y2H http://www.geneontology.org/GO.evidence.shtml review papers subcategories of ISS BLASTs, orthology comparison, HMMs
41
GO evidence code decision tree
42
GOA makes annotations using two methods Electronic Quick way of producing large numbers of annotations Annotations are less detailed Manual Time-consuming process producing lower numbers of annotations Annotations are very detailed and accurate
43
Electronic annotation by GOA 1. Mapping of external concepts to GO terms InterPro2GO (protein domains) SPKW2GO (UniProt/Swiss-Prot keywords) HAMAP2GO (Microbial protein annotation) EC2GO (Enzyme Commission numbers) SPSL2GO (Swiss-Prot subcellular locations )
44
Aspartate transaminase activity ; GO:0004069 lipid transport; GO:0006869 Electronic annotation by GOA
45
2. Automatic transfer of annotations to orthologs
46
Manual annotation by GOA High-quality, specific annotations using: Peer-reviewed papers A range of evidence codes to categorize the types of evidence found in a paper www.ebi.ac.uk/GOA
47
Finding annotations in a paper In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Process: response to wounding GO:0009611 wound response serine/threonine kinase activity, Function: protein serine/threonine kinase activity GO:0004674 integral membrane protein Component: integral to plasma membrane GO:0005887 …for B. napus PERK1 protein (Q9ARH1) PubMed ID: 12374299
48
Qualifiers Modify the interpretation of an annotation NOT (protein is not associated with the GO term) colocalizes_with (protein associates with complex but is not a bona fide member) contributes_to (describes action of a complex of proteins) 'With' column Can include further information on the method being referenced e.g. the protein accession of an interacting protein Additional information
49
The NOT qualifier NOT is used to make an explicit note that the gene product is not associated with the GO term Also used to document conflicting claims in the literature NOT can be used with ALL three gene ontologies
50
In these cells, SIPP1 was mainly present in the nucleus, where it displayed a non-uniform, speckled distribution and appeared to be excluded from the nucleoli. excluded from the nucleoli
51
The colocalizes_with qualifier ONLY used with GO component ontology Gene products that are transiently associated with an organelle or complex
52
The colocalizes_with qualifier Example (from Schizosaccharomyces pombe): Clp1 (Q9P7H1) relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the contractile ring (evidence from GFP fusion).
53
The contributes_to qualifier Where an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complex contributes_to is not needed to annotate a catalytic subunit. ONLY used with GO function ontology
54
.. To test whether the protein complex consisting of PIG-A, PIG-H, PIG-C and hGPI1 has GlcNAc transferase activity in vitro…. …incubation of the radiolabeled donor of GlcNAc, UDP- [6-3H]GlcNAc, with lysates of JY5 cells transfected with GST-tagged PIG-A resulted in synthesis of GlcNAc-PI and its subsequent deacetylation to glucosa- minyl phosphatidylinositol (GlcN-PI) whether the protein complex has GlcNAc transferase activity resulted in synthesis of GlcNAc-PI and Its subsequent deacetylation to glucosa-minyl phosphatidylinositol (GlcN-PI)
55
WITH column The with column provides supporting evidence for ISS, IPI, IGI and IC evidence codes ISS: the accession of the aligned protein/ortholog IPI: the accession of the interacting protein IGI: the accession of the interacting gene IC: The GO:ID for the inferred_from term WITH column
56
How to access GO annotation data
57
Where can you find annotations? UniProtKB Ensembl Entrez gene
58
Gene Association Downloads 17 column files containing all information for each annotation GO Consortium website GOA website
59
GO browsers
60
GO Slims
61
GO slims Many GO analysis tools use GO slims to give a broad overview of the dataset GO slims are cut-down versions of the GO and contain a subset of the terms in the whole GO GO slims usually contain less-specialised GO terms
62
Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:
63
Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:
64
GO slims Custom slims are available for download; http://www.geneontology.org/GO.slims.shtml Or you can make your own using; QuickGO http://www.ebi.ac.uk/QuickGO AmiGO's GO slimmer http://amigo.geneontology.org/cgi-bin/amigo/slimmer
65
Just some things to be aware of…. The GO is continually changing New terms created Existing terms obsoleted Re-structured New annotations being created ALWAYS use a current version of ontology and annotations If publishing your analyses, please report the versions/dates you use: http://www.geneontology.org/GO.cite.shtml Differences in representation of GO terms may be due to biological phenomenon. But also may be due to annotation-bias or experimental assays Often better to remove the ‘NOT’ annotations before doing any large-scale analysis, as they can skew the results ontology annotation
66
How scientists use the GO, and the tools they use for analysis
67
Source of annotation If you wanted to find out the role of a gene product manually, you’d have to read an awful lot of papers But by using GO annotations, this work has already been done for you! GO:0006915 : apoptosis
68
How scientists use the GO Find out what a gene product does or which genes are involved in a certain biological process/function Analyse high-throughput genomic or proteomic datasets Validation of experimental techniques Get a broad overview of a proteome Obtain functional information for novel gene products Some examples…
69
time control Puparial adhesion Molting cycle Hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabolism Immune response attacked Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. MicroArray data analysis
70
Validation of experimental techniques (Cao et al., Journal of Proteome Research 2006) Rat liver plasma membrane isolation
71
Analysis of high-throughput proteomic datasets (Orrù et al., Molecular and Cellular Proteomics 2007) Characterisation of proteins interacting with ribosomal protein S19
72
Obtain functional information for novel gene products MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTS SELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGD FKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTG ELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRN FVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTI MPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHS RDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIP DSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLA EGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRER ADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDS TRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM InterProScan
73
Annotating novel sequences Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence Two tools currently available; AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi searches the GO Consortium database BLAST2GO (from Babelomics) http://www.blast2go.org/ searches the NCBI database
74
AmiGO BLAST Exportin-T from Pongo abelii (Sumatran orangutan)
75
Numerous Third Party Tools Many tools exist that use GO to find common biological functions from a list of genes: http://www.geneontology.org/GO.tools.microarray.shtml
76
GO tools: enrichment analysis Most of these tools work in a similar way: input a gene list and a subset of ‘interesting’ genes tool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes tool provides a statistical measure to determine whether enrichment is significant
77
Exercises Searching for GO annotations in QuickGO Exercise 2: using GO terms Exercise 3: using a protein ID Using QuickGO to create a tailored set of annotations Exercise 4: Filtering Exercise 5: Statistics Map-up annotation using a GO slim Exercise 6
78
EBI is an Outstation of the European Molecular Biology Laboratory. Thanks for listening
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.