Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.

Similar presentations


Presentation on theme: "Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology."— Presentation transcript:

1 Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology

2 S. cerevisiae

3 D. melanogaster

4 Cells that normally survive CED-9 ON CED-3 CED-4 OFF CED-9 OFF CED-3 CED-4 ON Cells that normally die C elegans

5 M. musculus

6 MCM3 MCM2 CDC46/MCM5 CDC47/MCM7 CDC54/MCM4 MCM6 These proteins form a hexamer in the species that have been examined Comparison of sequences from 4 organisms

7 A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!

8 Gene Ontology - 1998 FlyBaseDrosophilaCambridge, EBI, Harvard Berkeley & Bloomington. SGDSaccharomycesStanford. MGIMusJackson Labs., Bar Harbor.

9 Gene Ontology -now Fruitfly - FlyBase Budding yeast - Saccharomyces Genome Database (SGD) Mouse - Mouse Genome Database (MGD & GXD) Rat - Rat Genome Database (RGD) Weed - The Arabidopsis Information Resource (TAIR) Worm - WormBase Dictyostelium discoidem - Dictybase InterPro/UniProt at EBI - InterPro Fission yeast - Pombase Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sanger Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR Grasses - rice & maize - Gramene database zebra fish – Zfin.........

10 To provide structured controlled vocabularies for the representation of biological knowledge in biological databases.

11 Be open source Use open standards Make data & code available without constraint Involve your community

12 Gene Ontology Objectives GO represents concepts used to classify specific parts of our biological knowledge: –Biological Process –Molecular Function –Cellular Component GO develops a common language applicable to any organism GO terms can be used to annotate gene products from any species, allowing comparison of information across species

13 GO: Three ontologies Where does it act? What processes is it involved in? What does it do?Molecular Function Cellular Component Biological Process gene product

14 Molecular Function 7,309 terms Biological Process 10,041 terms Cellular Component 1,629 terms Total 18, 975 terms Definitions: 94.9 % Obsolete terms: 992 Content of GO

15 term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. What’s in a GO term?

16 Mitochondrial P450 Annotation of gene products with GO terms

17 Cellular component: mitochondrial inner membrane GO:0005743 Biological process: Electron transport GO:0006118 Molecular function: monooxygenase activity GO:0004497 substrate + O 2 = CO 2 +H 2 0 product

18 Other gene products annotated to monooxygenase activity (GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis)

19

20

21 What’s in a name? Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis All refer to the process of making glucose from simpler components

22 tree directed acyclic graph

23 Nucleus Nucleoplasm Nuclear envelope ChromosomePerinuclear spaceNucleolus A child is a subset of a parent’s elements The cell component term Nucleus has 5 children Parent-Child Relationships

24 Ontology Relationships Directed Acyclic Graph

25

26 Evidence Codes for GO Annotations http://www.geneontology.org/doc/GO.evidence.html

27 IEAInferred from Electronic Annotation ISSInferred from Sequence Similarity IEPInferred from Expression Pattern IMPInferred from Mutant Phenotype IGIInferred from Genetic Interaction IPIInferred from Physical Interaction IDAInferred from Direct Assay RCAInferred from Reviewed Computational Analysis TASTraceable Author Statement NASNon-traceable Author Statement ICInferred by Curator NDNo biological Data available

28 Meloidogyne incognita: McCarter et al. 2003 Annotation summaries

29

30 Two types of GO Annotations:  Electronic Annotation  Manual Annotation All annotations must: be attributed to a source indicate what evidence was found to support the GO term-gene/protein association

31 Manual Annotations High–quality, specific gene/gene product associations made, using: Peer-reviewed papers Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists

32 1.Extract information from published literature 2.Curators performs manual sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis) Manual Annotations: Methods

33 Finding GO terms In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GFP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Process: response to wounding GO:0009611 serine/threonine kinase activity, Function: protein serine/threonine kinase activity GO:0004674 integral membrane protein Component: integral to plasma membrane GO:0005887 PubMed ID: 12374299 wound response

34 Electronic Annotations Provides large-coverage High-quality BUT – annotations tend to use high-level GO terms and provide little detail.

35 1.Database entries Manual mapping of GO terms to concepts external to GO (‘translation tables’) Proteins then electronically annotated with the relevant GO term(s) 2.Automatic sequence similarity analyses to transfer annotations between highly similar gene products Electronic Annotations: Methods

36 Fatty acid biosynthesis (Swiss-Prot Keyword) EC:6.4.1.2 (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit ( InterPro entry) GO:Fatty acid biosynthesis ( GO:0006633 ) GO:acetyl-CoA carboxylase activity ( GO:0003989 ) GO:acetyl-CoA carboxylase activity (GO:0003989) Electronic Annotations

37 Mappings of external concepts to GO EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022 EC:1.1.1.10 > GO:L-xylulose reductase activity ; GO:0050038 EC:1.1.1.104 > GO:4-oxoproline reductase activity ; GO:0016617 EC:1.1.1.105 > GO:retinol dehydrogenase activity ; GO:0004745

38 Annotate to finest granularity Annotating to GO:0030047 automatically annotates to all of its parents; thus a product is annotated to both protein modification AND cytoskeleton organization

39 A gene product can have several functions, cellular locations and be involved in many processes Annotation of a gene product to one ontology is independent from its annotation to other ontologies Annotations are only to terms reflecting a normal activity or location Usage of ‘unknown’ GO terms Additional points

40 Unknown v.s. Unannotated “Unknown” is used when the curator has determined that there is no existing literature to support an annotation. –Biological process unknown GO:0000004 –Molecular function unknown GO:0005554 –Cellular component unknown GO:0008372 NOT the same as having no annotation at all –No annotation means that no one has looked yet

41 Annotation of a genome GO annotations are always work in progress Part of normal curation process –More specific information –Better evidence code Replace obsolete terms “Last reviewed” date

42 How to access the Gene ontology and its annotations 1. Downloads Ontologies Annotations : Gene association files Ontologies and Annotations 2. Web-based access AmiGO (http://www.godatabase.org) QuickGO (http://www.ebi.ac.uk/ego) among others…

43 组别 第四讲:讨论论文(课堂讨论 时间 5 分左右) A C D E H M S


Download ppt "Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology."

Similar presentations


Ads by Google