Insight into GO and GOA Angelica Tulipano , INFN Bari CNR

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Gene Ontology John Pinney
POC tutorial#3: Annotation This tutorial will run automatically in Quicktime. To run the tutorial at your own pace use the internal controllers within.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens.
COG and GO tutorial.
CACAO Biocurator Training CACAO Fall CACAO Syllabus What is CACAO & why is it important? Training Examples.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Protein and Function Databases
BICH CACAO Biocurator Training Session #3.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
CACAO training part 1 Jim Hu and Suzi Aleksander For UW Parkside Fall 2014.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Gene expression analysis
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Number of gene productsIn this taxonAnnotated to this GO term directly or indirectly 2Mammalsphotosynthesis 5Mammalsmale germ-line cyst formation 14non-Arthropodshemocyte.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
EMBRACE Workshop Appled Gene Ontology ITB – CNR Bari, Italy 7. – 9. November 2007 Domenica D’Elia, Giulia De Sario, Andreas Gisel, Cecilia Saccone, Angelica.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Extracting Biological Information from Gene Lists
Gene Annotation & Gene Ontology
Annotating with GO: an overview
Introduction to the Gene Ontology
Gene expression analysis
Annotating Gene Products to the GO
Presentation transcript:

Insight into GO and GOA Angelica Tulipano , INFN Bari CNR Giulia De Sario , ITB Bari CNR Andreas Gisel, ITB Bari CNR EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov -9.Nov 2007

GODB GO_200709 3,3 million gene products more than 100000 organisms The GO Database, which comprises both ontology and annotation data, is built from the flat files available on the GO website, and can be downloaded in mySQL or RDF XML format. termdb ontologies, definitions and mappings to other dbs assocdb the above, plus associations to gene products seqdb the above, plus protein sequences for some of the gene products seqdblite the above, with IEA associations stripped out (this is the version that drives AmiGO) GO_200709 3,3 million gene products more than 100000 organisms 25000 GO terms 14,5 million associations EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GO tree - path Total terms 24955 Number of terms without child 14974 (60%) Number of terms with children 10019 (40%) EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GO tree - path Total terms 24955 Number of terms without child 14974 Number of terms with children 10019 Number of different path 261034 Average number of path / end term 17 Max number of path / end term 851 EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GO tree - path Total terms 24955 Number of terms without child 14974 Number of terms with children 10019 Average length of path 10 Max length of path 18 EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GO tree - path The GO is very wide and has a large knowledge to associate with gene products, however the depth of the path is quite short EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

Gene product description BCL2_HUMAN 3 million gene products (UniProt) are described by 47636 descriptions EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

Gene product description BCL2_HUMAN GODB version go_09_07 Gene product per description Descriptions Gene products 1 25119 25119 2-10 13548 52731 11-50 4499 105762 51-100 1296 93586 101-500 2118 492847 501-1000 545 377029 1000-77069 431 1876746 3 million gene products (UniProt) described by 47636 descriptions EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GOA Evidence Description Code Inferred by Curator IC experimental Inferred from Direct Assay IDA experimental Inferred from Electronic Annotation IEA computational Inferred from Expression Pattern IEP experimental Inferred from Genetic Interaction IGI experimental Inferred from Mutant Phenotype IMP experimental Inferred from Physical Interaction IPI experimental Inferred from Sequence or Structural Similarity ISS computational Non-traceable Author Statement NAS experimental No biological Data available ND -- Inferred from Reviewed Computational Analysis RCA computational Traceable Author Statement TAS experimental Not Recorded NR -- EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GOA Evidence Description Code Inferred by Curator IC exp 388 Inferred from Direct Assay IDA exp 10888 Inferred from Electronic Annotation IEA comp 15419002 99,5% Inferred from Expression Pattern IEP exp 392 Inferred from Genetic Interaction IGI exp 145 Inferred from Mutant Phenotype IMP exp 1646 Inferred from Physical Interaction IPI exp 7517 Inferred from Sequence or Structural Similarity ISS comp 16759 Non-traceable Author Statement NAS exp 10811 No biological Data available ND -- 3386 Inferred from Reviewed Computational Analysis RCA comp 107 Traceable Author Statement TAS exp 19463 Not Recorded NR -- 1185 total associations 13402670 15491689 100% EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GOA p-value The number of gene products associated to a term or any of its children, divided by the number of total associations between the GO terms and gene products. The smaller the p(term) is the higher the information content and the more detailed the description. name level term-type count p(term) molecular_function 1 molecular_function 3471081 0.477303 biological_process 1 biological_process 2243629 0.308518 cellular_component 1 cellular_component 1864423 0.256374 hormone activity 4 molecular_function 4504 0.000619338 Gliogenesis 4 biological_process 95 1.30633e-05 cell fate specification 5 biological_process 204 2.80517e-05 Angiogenesis 7 biological_process 363 4.99156e-05 EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GOA delta p-value EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GOA p-value One would expect a linear increase of the information content along a path Re-evaluate annotaions and GO term choise according such studies EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007

GO - GOA Important knowledge to understand better biological data Urgent need to collect and incoorporate existent information especially from non-model organisms THANKS!!!!!! EMBRACE Workshop on ‘Applied Gene Ontology’ Bari, Italy 7.Nov - 9.Nov 2007