Presentation is loading. Please wait.

Presentation is loading. Please wait.

European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder

Similar presentations


Presentation on theme: "European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder"— Presentation transcript:

1 European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder mulder@ebi.ac.uk

2 European Bioinformatics Institute Contents Introduction to GOA Manual GOA annotation Electronic annotation: –InterPro2GO GOA data flow Uses of GOA Future plans

3 European Bioinformatics Institute What is GO annotation? An annotation is a statement that a gene product has a particular molecular function is involved in a particular biological process is located within a certain cellular component …as determined by a particular method …as described in a particular reference. GO Term ID GO Term ID Evidence Code Evidence Code Reference

4 European Bioinformatics Institute Gene Ontology Annotation (GOA) Database GOA’s priority is to annotate the human, mouse and rat proteomes Largest open-source contributor of annotations to GO Provides 10 million annotations for more than 111,000 species Share and integrate GO annotation

5 European Bioinformatics Institute How do we annotate GO terms  Manual Annotation  Electronic Annotation All annotations must: be attributed to a source indicate what evidence was found to support the GO term-gene/protein association

6 European Bioinformatics Institute Manual annotation High quality Specific gene or gene product associations made using: –Peer reviewed papers –Evidence codes BUT: –Time-consuming –Requires trained biologists

7 European Bioinformatics Institute Manual GO annotation Read papersFind GO termAnnotate to protein GOA-association file Oracle RDBMS Pubmed ID, Evidence code GO and EBI ftp sites

8 European Bioinformatics Institute Protein2GO tool Online

9 European Bioinformatics Institute Information captured by GOA SourceGOIDTermEvidRefDB RefIDWith DBWith IDQualifier

10 European Bioinformatics Institute How successful is manual-GOA ? SourceNo. of annotationsNo. of distinct proteins Proteome Inc.220546568 UniProt6791013697 IntAct2200211013 MGI12491929837 SGD217615076 FlyBase523868775 RGD80363369 HGNC3699798 GeneDB55021384 TAIR/TIGR33671895 ZFIN1012334 Roslin Institute146 AgBase889173 Reactome1512 WormBase893443 TIGR13979 Gramene1392812 GDB165103 TOTAL MANUAL33623770728 July 2006 111740 taxa

11 European Bioinformatics Institute Large-scale assignment of GO terms to UniProtKB entries using existing information within database entries and manual mappings Get IEA evidence code Electronic Annotation Curated mapping e.g. EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022 UniProt KeywordHAMAPInterProEC GO Curated or electronic rule based mappings High quality electronic protein to GO associations

12 European Bioinformatics Institute www.uniprot.org/

13 European Bioinformatics Institute http://www.geneontology.org/GO.indices.shtml Mappings of external concepts to GO

14 European Bioinformatics Institute InterPro2GO mapping InterPro is a resource that integrates protein signatures databases, e.g. Pfam, Prints, Prosite, ProDom, SMART, TIGRFAMs etc. It provides a means of classifying proteins into families and identifying domains. Each InterPro entry groups proteins belonging to the same family and potentially having the same function

15 European Bioinformatics Institute InterPro2Go mapping Done manually, but using tools Look at InterPro and protein annotation For all Swiss-Prot proteins matching entry truly: –Get stats on DE lines, keywords, comments –Check how conserved common annotation is –Find appropriate GO term at most specific level that applies to all proteins (not necessarily domains)

16 European Bioinformatics Institute Tools used –”SQUID” Statistics options: keyword description Gene name Organism Comments, etc.

17 European Bioinformatics Institute SQUID statistics output

18 European Bioinformatics Institute SQUID statistics output

19 European Bioinformatics Institute InterPro2GO mapping in entry

20 European Bioinformatics Institute InterProScan output with GO terms

21 European Bioinformatics Institute InterPro2GO sanity checks Run weekly Reports: Obsolete GO terms Obsolete (deleted) IPRs Secondary IPRs

22 European Bioinformatics Institute Quality of GO mapping BioCreAtIvE test set -635 GO annotations through InterPro2GO Camon et al., 2005, BMC Bioinformatics Manually checked 44 proteins, 107 predictions: 97 correct (90%): -40 exact -57 same lineage 10 new lineage (unknown) 0 incorrect Exact term15124% Same lineage < granularity27343% Same lineage > granularity244% New lineage18729% Minimal correct42467% Potentially incorrect21133% Precision 67-100%

23 European Bioinformatics Institute InterPro2GO mapping statistics Total no. IPRS mapped to GO7126 % of IPRs mapped to at least 1 GO term54% No. IPRS mapped to molecular function5741 No. IPRS mapped to biological process5543 No. IPRS mapped to cellular component3426 No. GO terms mapped2811 No. UniProt proteins mapped through interpro2go2006489 (61%) % UniProt covered by InterPro77.6%

24 European Bioinformatics Institute Provides large coverage High Quality However these annotations often use high-level GO terms and provide little detail. How successful is IEA-GOA in general? IEA MethodNo. of annotationsNo. of distinct proteins InterPro2GO62819162006489 HAMAP2GO19990485814 SP Keyword2GO36138831287830 EC2GO207540202657 TOTAL103032432167001 Jun 2006 Manual ones: 336237 70728

25 European Bioinformatics Institute Total GO statistics Total no. GO annotations10639480 % GO associations manual3.16% % GO associations electronic96.84 % GO associations interpro2GO59% Total no. proteins annotated to GO2168717 % UniProt GO annotated in total68.2% % UniProt GO annotated manually2.2% % UniProt GO annotated electronically66% % UniProt GO annotated through interpro2go61%

26 European Bioinformatics Institute GOA data flow Gene association files

27 European Bioinformatics Institute Gene Association file format http://www.geneontology.org/GO.annotation.shtml

28 European Bioinformatics Institute Example GOA cow file

29 European Bioinformatics Institute Output from the GOA database GOA Cow New Redundant Non-Redundant: based on IPI Data also available in SRS, UniProt, QuickGO, MODs, Ensembl etc. GA slim for UniProt + GO slims

30 European Bioinformatics Institute GA Files for Non-redundant species Non-redundant complete protein set for each proteome is identified (>25% GO coverage) Includes UniProt, IPI and MOD-specific IDs, e.g. mouse (MGI), rat (RGD), zebrafish (ZFIN) etc. Xref files available with identifiers from: UniProt, IPI, RefSeq, Ensembl, UniGene etc. ftp://ftp.ebi.ac.uk/pub/databases/GO/goa ftp://ftp.ebi.ac.uk/pub/databases/integr8

31 European Bioinformatics Institute Uses of GOA data Access protein functional information Look at relationships between proteins, e.g. IntAct Connect biological information to gene expression data Determine functional composition of a proteome –using GO slim

32 European Bioinformatics Institute Find functional information on proteins http://www.ebi.ac.uk/ego Uses of GOA

33 European Bioinformatics Institute Find functional information on interaction proteins (IntAct) http:www.ebi.ac.uk/intact Uses of GOA

34 European Bioinformatics Institute Overview proteome with GO Slim http://www.ebi.ac.uk/integr8 Uses of GOA

35 European Bioinformatics Institute Microarray data analysis Proteomics data analysis Kislinger T et al, Mol Cell Proteomics, 2003 Larkin JE et al, Physiol Genomics, 2004 Cunliffe HE et al, Cancer Res, 2003 GO classification Analysis of high-throughput data according to GO Uses of GOA

36 European Bioinformatics Institute Future plans Continue deep level annotation of human, mouse and rat Manually annotate splice variants Outreach and inclusion of new datasets e.g. grape New electronic mappings, e.g. unipathway2go Ortholog prediction for electronic GO annotation Develop tools for annotation training

37 European Bioinformatics Institute Evelyn Camon GOA Coordinator Daniel Barrell GOA Programmer Emily Dimmer GOA Curator Rachael Huntley GOA Curator David Binns & John Maslen QuickGO, GOA tools All EBI UniProtKB Curators, HAMAP(SIB), IntAct, GO Editorial Office @ EBI All GO Consortium & associate members Rolf Apweiler Head of sequence database group Acknowledgements


Download ppt "European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder"

Similar presentations


Ads by Google