Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
COG and GO tutorial.
CACAO Biocurator Training CACAO Fall CACAO Syllabus What is CACAO & why is it important? Training Examples.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
BICH CACAO Biocurator Training Session #3.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
GO Enrichment analysis COST Functional Modeling Workshop April, Helsinki.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Examples of functional modeling. NCSU GO Workshop 29 October 2009.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Managing Data Modeling GO Workshop 3-6 August 2010.
Gene expression analysis
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Increasing GO Annotation Through Community Involvement Fiona McCarthy*, Nan Wang*, Susan Bridges** and Shane Burgess** GO.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
9/10/06 GO Users Meeting 2006 Seattle, Washington The AgBase GO Annotation Tools Susan Bridges 1,3, Fiona McCarthy 2,3, Nan Wang 1,3, G. Bryce Magee 1,3,
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
S. pombe Unicellular archiascomycete Diverged from S. cerevisiae Ma Size ~14 Mb, 3 chromosomes No synteny Data stored in GeneDB.
Prioritization of Avian GO Annotation , , Chicken ,06949,5163.4Rat ,69664, Mouse ,83036, Human.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
Extracting Biological Information from Gene Lists
Getting GO annotation for your dataset
CACAO Training ASM-JGI 2012.
Annotating with GO: an overview
Introduction to the Gene Ontology
Workshop Aims TAMU GO Workshop 17 May 2010.
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Modeling Functional Genomics Datasets CVM
Ensembl Genome Repository.
Gene expression analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Fiona McCarthy, Carl Schmidt, Parker Antin, Shane Burgess
Annotating Gene Products to the GO
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI

1.Some of what we’ve been doing :Confirmation of predicted/hypothetical proteins in chicken 2. Something of more interest to almost everyone in here for analyzing your data.

Educate researchers who need to use GO. University of Delaware, November, …… currently working with researchers from the Universities of Delaware and Maryland to provide GO annotations necessary to facilitate publication of array data. First residential workshop at MSU in May

Avian Genome Conference May, 2008 GO Annotation Jamboree May, 2008

“Hypothetical” and “predicted” proteins Naive and activated purified CD4+ T cells; transformed CD4+ T cells; spleen; brain tissues; bursal B and stromal cells; muscle; and serum. Database of all predicted proteins, from chicken build 2.1, using DFF-2D LC MS2 and our computational pipeline. Experimentally-confirmed 7,809 chicken predicted proteins: 52% were expressed in more than one tissue. 6,027 (77%) of these proteins mapped to human and mouse orthologs and we assigned standardized nomenclature to 5,326 (64%). 8,213 GO associations to 21% of the identified chicken proteins using the ISS evidence code to transfer function between human-chicken and human-mouse orthologs increased the current chicken GO annotations by 8% and doubled the number of chicken manually- curated annotations. In PRIDE and NCBI databases and being used at NCBI to promote XP (computational model) to NP (confirmed product) accessions i.e. the words “hypothetical” and “predicted” are removed. We also add experimentally-derived cell component GO annotations.

48% (3,779) 1% (61) 4% (313) 7% (561) 26% (2,020) 14% (1,073) 0% (0) 0% (2) In one tissueIn two tissuesIn three tissuesIn four tissues In five tissuesIn six tissuesIn seven tissues In all eight tissues Tissue distribution of expressed ‘predicted’ proteins Spleen UA01Stroma Tcells B-cells Serum Muscle Brain Tissue type Number of proteins Tissue specific proteins Proteins identified in other tissues

chicken: human/mouse orthologs (1:1) 236 Mouse orthologs Human orthologs 5, No human or mouse orthologs 1,784

Cumulative external visits to AgBase JAuSeOcNoDeJaFeMaApMaJuJAuSeOcNoDeJaFeMaApMaJuJAuSeOcNoDe 07

Summary of GO annotations for last 12 months 11,716 GO annotations for chicken & cow: 214 cow gene products GO annotated (1,521 GO annotations) 1,762 chicken gene products GO annotated (10,194 GO annotations) in addition, orthology with human and mouse genes used to GO annotate 7,809 computationally ‘predicted’ chicken proteins (8,213 GO annotations)

Annotation metrics

Database distribution of AgBase GO Annotations AgBase Community file GO Consortium file Chicken Dec '07 Cow Dec '07

GO Annotation of Arrays

Functional annotation using Gene Ontology Nomenclature (species’ genome nomenclature committees) Other annotations using other bio- ontologies e.g. Anatomy Ontology Structural Annotation including Sequence Ontology Genomic Annotation

Quality improvement of annotations Pre-annotationRe-annotation

GO annotation of arrays. Array IDs ‘known’ genes from public databases ‘predicted’ genes from genome sequencing Are strict mammalian orthologs available ? GO annotation of literature Is functional literature available ? Gene product IDs Electronic GO annotation using InterPro data (IEA) GO annotation from orthologs (ISO) Collate GO annotations Submit to EBI-GOA, GOC YES NO structural mapping link to array IDs (updateable)

AgBase: annotating arrays 1. Del-Mar 14K Chicken Integrated Systems microarray (GPL1731). 14,053 chicken genes represented 9,587 contigs GO annotated (CC:3,514; MF:6,640; BP:4,623) 3,101 singletons GO annotated (CC:487; MF: 881; BP:646) many singletons map to chicken ESTs with no associated GO

metabolic process transport cell communication development immune response cell death cell differentiation response to stress sensory perception cell motility regulation of biological process cellular organization and biogenesis behavior response to chemical stimulus process unknown Figure 1A: Biological Process associated with Del-Mar 14K array

Relative amount of GO BP associated with Del-Mar 14K array compared to total chicken GO development immune response cell death response to stress process unknown cell motility cell differentiation behavior transport regulation of biological process sensory perception response to chemical stimulus secretion cellular organization and biogenesis response to stimulus metabolic process cell communication Array GO/total chicken GO GO Biological Processes

AgBase: annotating arrays 2. TAMU Agilent 44K chicken array approx 44,000 chicken genes represented added GO annotation for 8,731 chicken gene products many of the array IDs with no associated GO annotation map to chicken EST sequences

AgBase: annotating arrays 3. FHCRC Chicken 13K v2.0 (GPL1836) 13,007 chicken genes represented 2,491 array IDs mapped to chicken gene products & GO annotated 628 mapped to chicken gene products with no GO approx 2,000 array IDs mapped to human or mouse gene products with GO annotation

GO Annotation Quality Score: “GAQ” GAQ : no. annotations; DAG depth; GO evidence code calculate overall GAQ score for any dataset (eg. array) calculate GAQ for subsets (eg. biological processes studied using arrays)

“Gene Ontology” “Biological Process” IEA inferred from electronic annotation ISS inferred from sequence similarity IMP inferred from mutant phenotype IGI inferred from genetic interaction IPI inferred from physical interaction IDA inferred from direct assay IEP inferred from expression pattern TAS traceable author statement NAS non-traceable author statement ND no biological data available RCA inferred from reviewed computational analysis IC inferred by curator Evidence Code Your Favorite Gene Low GAQ score  Your NEW Favorite gene High GAQ score

Quantification of re-annotation Metrics GranularitySpecificity # previous annotations# chicken annotations # re-annotations# human/mouse annotations Quality Gene Annotation Quality (GAQ) score

Whole Array Chicken Human/Mouse Annotation type Number of annotations Pre-annotation Re-annotation 13% of previous annotations to other species were corrected to chicken specific annotations 300% increase 50% increase 700% increase GRANULARITYSPECIFICITY Bart van den Berg, CVM MSU/ Sue Lamont and Huaijun Zhu

2.8579,599207,869Total GAQ score 4.84,240886Total # proteins (Breadth) ,53739,355Confidence score total ,18487,250Depth Fold differenceRe-annotationPre-annotation GAQ score summary

Quality improvement of annotations Pre-annotationRe-annotation

GO biological process annotations cell communication metabolic process catabolic process transport regulation of biological process Macromolecule metabolic process biological_process cell motility response to stimulus Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process cell differentiation cell death multicellular organismal development GO Term Relative difference microarray GO / total chicken GO

Modeling using the GO Functional Understanding ImpliedDerived Physiology (= Cellular Component + Biological Process + Molecular Function) Network ModelingGene Ontology (interactions)

Hypothesis-driven GO-based data interrogation Buza, J. J. and S.C. Burgess. Modeling the proteome of a Marek's disease transformed cell line: a natural animal model for CD30 over-expressing lymphomas. Proteomics, :

Avian Genome Conference May, 2008 GO Annotation Jamboree May, 2008