COG and GO tutorial.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Microarray Data Analysis Day 2
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein analysis and proteomics Friday, 27 January 2006 Introduction to Bioinformatics DA McClellan
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
1 Gene Ontology and Semantic Similarity Measures.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
Protein analysis and proteomics (Part 1 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Ontologies, data standards and controlled vocabularies.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
The Bioinformatics of Microarrays
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Gene expression analysis
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Gene Onotology Part 1: what is the GO? Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Emily Dimmer GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK Gene Ontology (GO)
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Part II GO-Vocabulary of Genome. S. cerevisiae D. melanogaster.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Gene Ontology Consortium
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
S. pombe Unicellular archiascomycete Diverged from S. cerevisiae Ma Size ~14 Mb, 3 chromosomes No synteny Data stored in GeneDB.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Gene Ontology TM (GO) Consortium
Introduction to the Gene Ontology
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Genome Annotation Continued
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

COG and GO tutorial

The Clusters of Orthologous Groups (COGs) Database The protein database of Clusters of Orthologous Groups (COGs) is an attempt to phylogenetically classify the complete complement of proteins (both predicted and characterized) encoded by complete genomes. Each COG is a group of three or more proteins that are inferred to be orthologs, i.e., they are direct evolutionary counterparts.

An example

Ontologies for Molecular Biology “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing” (Gruber 1993) Gene Ontologies (GO) - Ontologies for molecular biology domains developed and supported by the Gene Ontology Consortium for gene and gene product annotations for all organisms Not enough to achieve integration through the use of accession IDs to identify common objects. Machine-interpretable definitions of basic concepts in a domain and relations amoung them. Defines a common vocabulary for researchers who need to share information in a domain. Need to define biological terms for communication of concepts ‘ectoderm determination’. These concept definitions need to be valid for all relevant organisms…plants and animals and microbial systems…if we are to be able to exploit the information we have to achieve greater understanding of cellular functions term: ectoderm specificationGO id: NEWdefinition: The processes involved in the specification of cell identityin the ectoderm. Once specification has taken place, a cell will becommitted to differentiate down a specific pathway if left in itsnormal environment.definition_reference: GO:curators

Gene Ontology Objectives GO represents concepts used to classify specific parts of our biological knowledge: Biological Process Molecular Function Cellular Component GO develops a common language applicable to any organism GO terms can be used to annotate gene products from any species, allowing comparison of information across species GO is the designation of a project as well as the product of the project. Starting with the cellular level, we are not distinguishing cell types, organs, etc. Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms. You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, …). We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG). Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartment

What GO is NOT: Not a way to unify biological databases Not a dictated standard Not a database of gene products, protein domains, or motifs Does not define evolutionary relationships

The 3 Gene Ontologies Molecular Function = elemental activity/task the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity Biological Process = biological goal or objective broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component = location or complex subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme GO is the designation of a project as well as the product of the project. Starting with the cellular level, we are not distinguishing cell types, organs, etc. Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms. You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, …). We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG). Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartment

Terms, Definitions, IDs term: MAPKKK cascade (mating sensu Saccharomyces) goid: GO:0007244 definition: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces. definition_reference: PMID:9561267 comment: This term was made obsolete because it is a gene product specific term. To update annotations, use the biological process term 'signal transduction during conjugation with cellular fusion ; GO:0000750'. definition: MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces

Directed Acyclic Graph EBI GOA

Evidence Codes for GO Annotations http://www.geneontology.org/doc/GO.evidence.html

IEA Inferred from Electronic Annotation ISS Inferred from Sequence Similarity IEP Inferred from Expression Pattern IMP Inferred from Mutant Phenotype IGI Inferred from Genetic Interaction IPI Inferred from Physical Interaction IDA Inferred from Direct Assay RCA Inferred from Reviewed Computational Analysis TAS Traceable Author Statement NAS Non-traceable Author Statement IC Inferred by Curator ND No biological Data available

Useful information and links COG: http://www.ncbi.nih.gov/COG Science 1997 Oct 24;278(5338):631-7 BMC Bioinformatics 2003 Sep 11;4(1):41 GO: http://www.geneontology.org/ Amigo: http://www.godatabase.org/cgi-bin/amigo/go.cgi GOst: http://www.godatabase.org/cgi-bin/gost/gost.cgi GOA: http://www.ebi.ac.uk/GOA/

Homework 1. Please explain how to annotate a given DNA sequence by using COG (The public tools are not ready to do this. Please explain how will you do it.) 2. Please look for the function of the gene(s) in the given DNA sequence by using gene ontology