Presentation is loading. Please wait.

Presentation is loading. Please wait.

GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL.

Similar presentations


Presentation on theme: "GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL."— Presentation transcript:

1 GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL

2 Outline 1.Introduction to the Gene Ontology 2.Gene Ontology annotations 3.Editing the Gene Ontology 4.Practical applications for the Gene Ontology 5.The Gene Ontology as one of many biological ontologies

3 Sequence databases: GenBank, EMBL, DDBJ Year19822005 Number of records 60244, 202,133

4 Genome Databases *Mouse Genome Informatics *FlyBase: Drosophila *WormBase: C. elegans *The Arabidopsis Information Resource *dictyBase: Dictyostelium discoideum *Saccharomyces Genome Database: Budding Yeast *ZFIN: Zebrafish *EcoGene - E. coli GeneCards Human ensembl NCBI human genome resources * manually curated by scientists

5

6 Published Literature PubMed: over 15 million citations Basic search: rad51 → 1038 articles Limit search: rad51, Human (organism) → 485 Boolean operators: rad51 AND cancer → 234 articles

7 Gene Ontology -Gene annotation system -Controlled vocabulary that can be applied to all organisms -Used to describe gene products

8 What’s in a name? What is a cell?

9 Cell

10

11

12

13 Image from http://microscopy.fsu.edu

14 What’s in a name? The same name can be used to describe different concepts

15 What’s in a name?

16 Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis All refer to the process of making glucose from simpler components

17 What’s in a name? The same name can be used to describe different concepts A concept can be described using different names  Comparison is difficult – in particular across species or across databases

18 What is the Gene Ontology? A (part of the) solution: -A controlled vocabulary that can be applied to all organisms -Used to describe gene products - proteins and RNA - in any organism

19 Ontology In philosophy, the most fundamental branch of metaphysics. It studies being or existence as well as the basic categories thereof—trying to find out what entities and what types of entities exist. – Wikipedia Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing – Gruber 1993

20 Ontology Includes: 1.A vocabulary of terms (names for concepts) 2.Definitions 3.Defined logical relationships to each other

21 Ontologies can be represented as graphs, where the nodes are connected by edges Nodes = concepts in the ontology Edges = relationships between the concepts node edge Ontology Structure

22 The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children Terms are linked by two relationships –is-a –part-of

23 Simple hierarchies (Trees) Directed Acyclic Graphs Single parentOne or more parents

24 Directed Acyclic Graphs (DAG) is-a part-of [other protein complexes] [other organelles] protein complex organelle mitochondrion fatty acid beta-oxidation multienzyme complex

25 True Path Rule The path from a child term all the way up to its top-level parent(s) must always be true cell  cytoplasm  chromosome  nuclear chromosome  nucleus  nuclear chromosome is-a  part-of 

26 How does GO work? What does the gene product do? Why does it perform these activities? Where does it act? What information might we want to capture about a gene product?

27 GO: Three ontologies Where does it act? What processes is it involved in? What does it do?Molecular Function Cellular Component Biological Process gene product

28 Cellular Component where a gene product acts

29 Mitochondrial membrane

30 Biological Process

31 Gluconeogenesis

32 Molecular Function A single reaction or activity, not a gene product A gene product may have several functions Sets of functions make up a biological process

33 Molecular Function

34 Carbonate dehydratase activity

35 term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. What’s in a GO term?

36 No GO Areas GO covers ‘normal’ functions and processes –No pathological processes –No experimental conditions NO evolutionary relationships NO gene products NOT a system of nomenclature

37

38 Molecular Function 7,309 terms Biological Process 10,041 terms Cellular Component 1,629 terms Total 18, 975 terms Definitions: 94.9 % Obsolete terms: 992 Content of GO As of October 2005

39 Outline 1.Introduction to the Gene Ontology 2.Gene Ontology annotations 3.Editing the Gene Ontology 4.Practical applications for the Gene Ontology 5.The Gene Ontology as one of many biological ontologies

40 Mitochondrial P450 Annotation of gene products with GO terms

41 Cellular component: mitochondrial inner membrane GO:0005743 Biological process: Electron transport GO:0006118 Molecular function: monooxygenase activity GO:0004497 substrate + O 2 = CO 2 +H 2 0 product

42 Other gene products annotated to monooxygenase activity (GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis)

43 Two types of GO Annotations:  Electronic Annotation  Manual Annotation All annotations must: be attributed to a source indicate what evidence was found to support the GO term-gene/protein association

44 Manual Annotations High–quality, specific gene/gene product associations made, using: Peer-reviewed papers Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists

45 Electronic Annotations Provides large-coverage High-quality BUT – annotations tend to use high-level GO terms and provide little detail.

46 1.Database entries Manual mapping of GO terms to concepts external to GO (‘translation tables’) Proteins then electronically annotated with the relevant GO term(s) 2.Automatic sequence similarity analyses to transfer annotations between highly similar gene products Electronic Annotations: Methods

47 Fatty acid biosynthesis (Swiss-Prot Keyword) EC:6.4.1.2 (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit ( InterPro entry) GO:Fatty acid biosynthesis ( GO:0006633 ) GO:acetyl-CoA carboxylase activity ( GO:0003989 ) GO:acetyl-CoA carboxylase activity (GO:0003989) Electronic Annotations

48 Mappings of external concepts to GO EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022 EC:1.1.1.10 > GO:L-xylulose reductase activity ; GO:0050038 EC:1.1.1.104 > GO:4-oxoproline reductase activity ; GO:0016617 EC:1.1.1.105 > GO:retinol dehydrogenase activity ; GO:0004745

49 1.Extract information from published literature 2.Curators performs manual sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis) Manual Annotations: Methods

50 Finding GO terms In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GFP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Process: response to wounding GO:0009611 serine/threonine kinase activity, Function: protein serine/threonine kinase activity GO:0004674 integral membrane protein Component: integral to plasma membrane GO:0005887 …for B. napus PERK1 protein (Q9ARH1) PubMed ID: 12374299 wound response

51 Annotate to finest granularity Annotating to GO:0030047 automatically annotates to all of its parents; thus a product is annotated to both protein modification AND cytoskeleton organization

52 A gene product can have several functions, cellular locations and be involved in many processes Annotation of a gene product to one ontology is independent from its annotation to other ontologies Annotations are only to terms reflecting a normal activity or location Usage of ‘unknown’ GO terms Additional points

53 Unknown v.s. Unannotated “Unknown” is used when the curator has determined that there is no existing literature to support an annotation. –Biological process unknown GO:0000004 –Molecular function unknown GO:0005554 –Cellular component unknown GO:0008372 NOT the same as having no annotation at all –No annotation means that no one has looked yet

54 GO Evidence Codes CodeDefinition IEAInferred from Electronic Annotation NASNon-traceable Author Statement TASTraceable Author Statement NDNo Data Use with annotation to unknown IDAInferred from Direct Assay *IPIInferred from Physical Interaction *IGIInferred from Genetic Interaction IMPInferred from Mutant Phenotype IEPInferred from Expression Pattern *ICInferred from Curator *ISSInferred from Sequence Similarity Manually annotated

55 GO Evidence Codes *With column required Manually annotated CodeDefinition *IEAInferred from Electronic Annotation IDAInferred from Direct Assay IEPInferred from Expression Pattern *IGIInferred from Genetic Interaction IMPInferred from Mutant Phenotype *IPIInferred from Physical Interaction *ISSInferred from Sequence Similarity TASTraceable Author Statement NASNon-traceable Author Statement *ICInferred from Curator RCAInferred from Reviewed Computational Analysis NDNo Data IDA: Enzyme assays In vitro reconstitution (transcription) Immunofluorescence Cell fractionation TAS: In the literature source the original experiments referred to are traceable (referenced).

56 GO Evidence Codes: with/from *With column required Manually annotated Additional information required for certain evidence codes CodeDefinition *IEAInferred from Electronic Annotation IDAInferred from Direct Assay IEPInferred from Expression Pattern *IGIInferred from Genetic Interaction IMPInferred from Mutant Phenotype *IPIInferred from Physical Interaction *ISSInferred from Sequence Similarity TASTraceable Author Statement NASNon-traceable Author Statement *ICInferred from Curator RCAInferred from Reviewed Computational Analysis NDNo Data IGI: a gene identifier for the "other" gene involved in the interaction IPI: a gene or protein identifier for the "other" protein involved in the interaction IC: GO term from another annotation used as the basis of a curator inference

57 TAS/IDA IMP/IGI/IPI ISS/IEP NAS IEA Term Hierarchy

58 1. NOT a gene product is NOT associated with the GO term to document conflicting claims in the literature. 2. Contributes to distinguishes between individual subunit functions and whole complex functions used with GO Function Ontology 3. Colocalizes with transiently or peripherally associated with an organelle or complex used with GO Component Ontology Modifying the interpretation of an annotation: the Qualifier column

59 Annotation of a genome GO annotations are always work in progress Part of normal curation process –More specific information –Better evidence code Replace obsolete terms “Last reviewed” date

60 How to access the Gene ontology and its annotations 1. Downloads Ontologies Annotations : Gene association files Ontologies and Annotations 2. Web-based access AmiGO (http://www.godatabase.org) QuickGO (http://www.ebi.ac.uk/ego) among others…

61 GO ontology (gene_ontology.obo) format-version: 1.0 date: 20:10:2005 17:32 saved-by: jlomax auto-generated-by: DAG-Edit 1.419 rev 3 default-namespace: gene_ontology remark: cvs version: $Revision: 3.1176 $ [Term] id: GO:0000001 name: mitochondrion inheritance namespace: biological_process def: "The distribution of mitochondria\, including the mitochondrial genome\, into daughter cells after mitosis or meiosis\, mediated by interactions between mitochondria and the cytoskeleton." [PMID:10873824, PMID:11389764, SGD:mcc] is_a: GO:0048308 ! organelle inheritance is_a: GO:0048311 ! mitochondrion distribution [Term] id: GO:0000002 name: mitochondrial genome maintenance namespace: biological_process def: "The maintenance of the structure and integrity of the mitochondrial genome." [GO:ai] is_a: GO:0007005 ! mitochondrion organization and biogenesis [Term] id: GO:0000003 name: reproduction alt_id: GO:0019952 namespace: biological_process def: "The production by an organism of new individuals that contain some portion of their genetic material inherited from that organism." [GO:curators, ISBN:0198506732] subset: goslim_generic subset: goslim_plant subset: gosubset_prok is_a: GO:0008150 ! biological_process [Term] id: GO:0000004 name: biological process unknown namespace: biological_process def: "Used for the annotation of gene products whose process is not known or cannot be inferred." [SGD:curators] subset: goslim_generic subset: goslim_goa subset: goslim_plant subset: goslim_yeast subset: gosubset_prok is_a: GO:0008150 ! biological_process

62 Viewing GO terms (DAG-Edit)

63 http://www.geneontology.org/GO.current.annotations.shtml Gene Association Files

64 Anatomy of a gene association file ColumnContentExample 1DBSGD, MGI 2DB_Object IDMGI:1234568 3DB_Object_SymbolGras3 4GO_ID QualifierNOT, co_localizes_with, contributes_to 5GO_IDGO:0001515 6DB_RefPMID:234567 7Evidence_CodeIDA, etc. 8With/From 9GO_aspectP (process), C (component) F (function) 10DB_Object_NameGrasshopper 3 homlog 11DB_Object_SynonymLocust III, 0122345E12Rik 12DB_Object_TypeGene, transcript, or protein 13Taxon taxon:4932 14Date20050101 15Assigned_byDB (usually same as column 1)

65

66 Viewing Annotations Amigo Browser: http://www.godatabase.org http://www.godatabase.org –A GO browser that tracks contributed GO annotations across species. –Uses annotation sets supplied in a specific format.

67 AmiGO: http://www.godatabase.org

68

69

70 Symbol Information Source Evidence Reference Anxa6 Anxa6 annexin A6, RGD TAS RGD:724802RGDTASRGD:724802 gene from Rattus norvegicus

71

72

73 Filter queries by organism, data source or evidence Search for GO terms or by Gene symbol/name Querying the GO

74

75

76

77 http://www.ncbi.nlm.nih.gov/entrez

78 www.uniprot.org/

79 www.ensembl.org/

80 dictyBase Gene Page

81 Outline 1.Introduction to the Gene Ontology 2.Gene Ontology annotations 3.Editing the Gene Ontology 4.Practical applications for the Gene Ontology 5.The Gene Ontology as one of many biological ontologies

82 How is GO maintained? Several full-time editors Requests from community –database curators, researchers, software developers –SourceForge tracker GO Consortium meetings for large changes Mailing lists

83 Reactome

84 Terms become obsolete when they are removed or redefined GO IDs are never deleted For each term, a comment is added to explains why the term is now obsolete Ensuring Stability in a Dynamic Ontology Obsolete Cellular Component Obsolete Molecular Function Obsolete Biological Process Biological Process Molecular Function Cellular Component

85 Why modify the GO GO reflects current knowledge of biology New organisms being added makes existing terms arrangements incorrect Not everything perfect from the outset

86 Example - parasites Original GO:

87 Example - parasites Annotation of P. falciparum –protozoan cellular parasite –intracellular infection (erythrocytes) Parasite proteins located in host nucleus What cellular component term to annotate to? –‘nucleus’ refers to parasite nucleus when annotating parasite

88 Example - parasites Added new term ‘host’:

89 Example - parasites parasite gene products located in host nucleus annotated here parasite gene products located in parasite nucleus annotated here

90 Requesting changes to GO - curator requests tracker Common changes suggested: –new term requests –reporting errors (typos, etc) –obsoletion/merge requests –add synonym –queries –term move (change parents)

91 The GO editorial office Primary responsibility to edit ontologies in response to community needs Also: –website –documentation –outreach GO in other systems new annotation groups –training

92 Outline 1.Introduction to the Gene Ontology 2.Gene Ontology Annotations 3.Editing the Gene Ontology 4.Practical applications for the Gene Ontology 5.The Gene Ontology as one of many biological ontologies

93 Access gene product functional information Find how much of a proteome is involved in a process/ function/ component in the cell Map GO terms and incorporate manual annotations into own databases Provide a link between biological knowledge and … gene expression profiles proteomics data What can scientists do with GO?

94 attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. …analysis of high-throughput data according to GO MicroArray data analysis

95 Color indicates up/down regulation GoMiner Tool, John Weinstein et al, Genome Biol. 4 (R28) 2003

96 http://www.geneontology.org/GO.tools

97 Outline 1.Introduction to the Gene Ontology 2.Gene Ontology Annotations 3.Editing the Gene Ontology 4.Practical applications for the Gene Ontology 5.The Gene Ontology as one of many biological ontologies

98 Beyond GO – Open Biomedical Ontologies Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions Anatomies Cell Types Sequence Attributes Temporal Attributes Phenotypes Diseases More…. http://obo.sourceforge.net

99 Sequence Ontology http://song.sourceforge.net

100 Ontology of ‘small molecular entities’ http://www.ebi.ac.uk/chebi

101 http://www.fruitfly.org/cgi-bin/ex/go.cgi

102 Anatomy Physiology Phenotype Pathway Disease Molecular Metabolic Developmental Stage Ontologies


Download ppt "GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL."

Similar presentations


Ads by Google