BI class 2010 Gene Ontology Overview and Perspective.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
24th Feb 2006 Jane Lomax Gene Ontology tutorial Talk:Using the Gene Ontology (GO) for Expression Analysis Practical:Onto-Express analysis tool Talk: GO.
25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Microarray Data Analysis Day 2
What is Ontology? Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of the kinds.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
COG and GO tutorial.
Internet tools for genomic analysis: part 2
Gene Annotation and GO SPH 247 Statistical Analysis of Laboratory Data.
1 Gene Ontology and Semantic Similarity Measures.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Gene Annotation and GO SPH 247 Statistical Analysis of Laboratory Data May 5, 2015SPH 247 Statistical Analysis of Laboratory Data 1.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
Gene Ontology (GO) Project
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Gene Set Enrichment Analysis (GSEA)
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Gene Ontology Overview and Perspective Lung Development Ontology Workshop.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Ontologies, data standards and controlled vocabularies.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Gene expression analysis
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Gene Onotology Part 1: what is the GO? Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Gene Ontology Project
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Gene Ontology TM (GO) Consortium
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Gene Annotation & Gene Ontology
Annotating with GO: an overview
GO : the Gene Ontology & Functional enrichment analysis
Introduction to the Gene Ontology
SPH 247 Statistical Analysis of Laboratory Data
Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
What is Ontology? s Dictionary:A branch of metaphysics concerned with the nature and relations of being. Barry Smith:The science of what is, of.
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

BI class 2010 Gene Ontology Overview and Perspective

What is Ontology? Dictionary: A branch of metaphysics concerned with the nature and relations of being s

What is the Gene Ontology? Allows biologists to make queries across large numbers of genes without researching each one individually

So what does that mean? From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.

Car? Ontology -definition

Gene Ontology (GO) Consortium Formed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to share knowledge. Seeks to achieve a mutual understanding of the definition and meaning of any word used; thus we are able to support cross-database queries. Members agree to contribute gene product annotations and associated sequences to GO database.

How does GO work? What does the gene product do? Where and does it act? Why does it perform these activities? What information might we want to capture about a gene product?

What is the Gene Ontology? Set of standard biological phrases (terms) which are applied to genes/proteins: –protein kinase –apoptosis –membrane

Molecular Function = elemental activity/task –the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity Biological Process = biological goal or objective –broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component = location or complex –subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme GO represents three biological domains

The GO is Actually Three Ontologies Biological Process GO term: tricarboxylic acid cycle Synonym: Krebs cycle Synonym: citric acid cycle GO id:GO: Cellular Component GO term: mitochondrion GO id: GO: Definition: A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration. Molecular Function GO term: Malate dehydrogenase. GO id: GO: (S)-malate + NAD(+) = oxaloacetate + NADH.

Cellular Component where a gene product acts

Cellular Component

Enzyme complexes in the component ontology refer to places, not activities.

Molecular Function activities or “jobs” of a gene product glucose-6-phosphate isomerase activity

Molecular Function insulin binding insulin receptor activity

Molecular Function A gene product may have several functions Sets of functions make up a biological process.

Biological Process a commonly recognized series of events cell division

Biological Process transcription

Biological Process regulation of gluconeogenesis

Biological Process limb development

Ontology Structure Terms are linked by two relationships –is-a  –part-of 

Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane is-a part-of

Ontology Structure Ontologies are structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children

Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane Directed Acyclic Graph (DAG) - multiple parentage allowed

–what kinds of things exist? –what are the relationships between these things? eye part_of lens is_a sense organ develops from Optic placode A biological ontology is: A (machine) interpretable representation of some aspect of biological reality

GO Definitions: Each GO term has 2 Definitions A definition written by a biologist: necessary & sufficient conditions written definition (not computable) Graph structure: necessary conditions formal (computable)

Appropriate Relationships to Parents GO currently has 2 relationship types –Is_a An is_a child of a parent means that the child is a complete type of its parent, but can be discriminated in some way from other children of the parent. –Part_of A part_of child of a parent means that the child is always a constituent of the parent that in combination with other constituents of the parent make up the parent.

Placement in the Graph: Selecting Parents To make the most precise definitions, new terms should be placed as children of the parent that is closest in meaning to the term. To make the most complete definitions, terms should have all of the parents that are appropriate. In an ontology as complicated as the GO this is not as easy as it seems.

True Path Violations Create Incorrect Definitions..”the pathway from a child term all the way up to its top-level parent(s) must always be true". chromosome Part_of relationship nucleus

True Path Violations..”the pathway from a child term all the way up to its top-level parent(s) must always be true". chromosome Mitochondrial chromosome Is_a relationship

True Path Violations..”the pathway from a child term all the way up to its top-level parent(s) must always be true". chromosome Mitochondrial chromosome Is_a relationship Part_of relationship nucleus A mitochondrial chromosome is not part of a nucleus!

True Path Violations..”the pathway from a child term all the way up to its top-level parent(s) must always be true". nucleuschromosome Nuclear chromosome Mitochondrial chromosome Is_a relationships Part_of relationship mitochondrion Part_of relationship

The Development Node (some example for consistent definitions)

Cell level [i] y cell differentiation ---[p] y cell fate commitment [p] y cell fate specification [p] y cell fate determination ---[p] y cell development [p] y cellular morphogenesis during differentiation [p] y cell maturation

y cell differentiation The process whereby a relatively unspecialized cell acquires specialized features of a y cell. [i] y cell differentiation ---[p] y cell fate commitment [p] y cell fate specification [p] y cell fate determination ---[p] y cell development [p] y cellular morphogenesis during differentiation [p] y cell maturation

y cell fate commitment The process whereby the developmental fate of a cell becomes restricted such that it will develop into a y cell. [i] y cell differentiation ---[p] y cell fate commitment [p] y cell fate specification [p] y cell fate determination ---[p] y cell development [p] y cellular morphogenesis during differentiation [p] y cell maturation

y cell fate specification The process whereby a cell becomes capable of differentiating autonomously into a y cell in an environment that is neutral with respect to the developmental pathway. Upon specification, the cell fate can be reversed. [i] y cell differentiation ---[p] y cell fate commitment [p] y cell fate specification [p] y cell fate determination ---[p] y cell development [p] y cellular morphogenesis during differentiation [p] y cell maturation

y cell fate determination The process whereby a cell becomes capable of differentiating autonomously into a y cell regardless of its environment; upon determination, the cell fate cannot be reversed. [i] y cell differentiation ---[p] y cell fate commitment [p] y cell fate specification [p] y cell fate determination ---[p] y cell development [p] y cellular morphogenesis during differentiation [p] y cell maturation

Gene Ontology widely adopted AgBase

Terms are defined graphically relative to other terms

The Gene Ontology (GO) 1.Build and maintain logically rigorous and biologically accurate ontologies 2.Comprehensively annotate reference genomes 3.Support genome annotation projects for all organisms 4.Freely provide ontologies, annotations and tools to the research community

The GO is still developing daily both in ontological structures and in domain knowledge Ontology development workshops focus on specific domains needing experts 2 workshops / year 1.Metabolism and cell cycle 2.Immunology and defense response 3.Early CNS development 4.Peripheral nervous system development 5.Blood Pressure Regulation 6.Muscle Development Building the ontologies

Mappings files Fatty acid biosynthesis ( Swiss-Prot Keyword) EC: (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit ( InterPro entry) GO:Fatty acid biosynthesis ( GO: ) GO:acetyl-CoA carboxylase activity ( GO: ) GO:acetyl-CoA carboxylase activity (GO: )

725 new terms related to immunology 127 new terms added to cell type ontology Building the ontology: Immune System Process Red part_of Blue is_a

P05147 PMID: GO: IDA P05147GO: IDA PMID: GO Term Reference Evidence Annotating Gene Products using GO Gene Product

Gene protein inherits GO term There is evidence that this gene product can be best classified using this term The source of the evidence and other information is included There is agreement on the meaning of the term

Annotations for APP: amyloid beta (A4) precursor proteinAP Annotations are assertions

NO Direct Experiment Inferred from evidence Direct Experiment in organism We use evidence codes to describe the basis of the annotation IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern IEA: Inferred from electronic annotation ISS: Inferred from sequence or structural similarity TAS: Traceable author statement NAS: Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis ND: no data available

GO structure GO isn’t just a flat list of biological terms terms are related within a hierarchy

GO structure gene A

GO structure This means genes can be grouped according to user- defined levels Allows broad overview of gene set or genome

GO Annotation Stats (2007) I GO Annotations Total manual GO annotations - 388,633 Total proteins with manual annotations – 80,402 Contributing Groups (including MGI): - 19 Total Pub Med References – 346,002 Total number predicted annotations – 17,029,553 Total number taxa – 129,318 Total number distinct proteins – 2,971,374

gene -> GO term associated genes GO annotations GO database genome and protein databases

Now we can query across all annotations based on shared biological activity. Annotations of gene products to GO are genome specific

GO browser

Search on ‘mesoderm development’ mesoderm development

Definition of mesoderm development Gene products involved in mesoderm development

Traditional analysis Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis … Gene 2 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 3 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 4 Nervous system Pregnancy Oncogenesis Mitosis … Gene 100 Positive ctrl. of cell prolif Mitosis Oncogenesis Glucose transport …

Using GO annotations But by using GO annotations, this work has already been done GO: : apoptosis

Grouping by process Apoptosis Gene 1 Gene 53 Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 … Positive ctrl. of cell prolif. Gene 7 Gene 3 Gene 12 … Growth Gene 5 Gene 2 Gene 6 … Glucose transport Gene 7 Gene 3 Gene 6 …

Anatomy of a GO term id: GO: name: gluconeogenesis namespace: process def: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. [ exact_synonym: glucose biosynthesis xref_analog: MetaCyc:GLUCONEO-PWY is_a: GO: is_a: GO: unique GO ID term name definition synonym database ref parentage ontology

GO is a functional annotation system of great utility to the data- driven biologist

GO enables genomic data analysis Microarrays allow biologists to record changes in gene function across entire genomes Result: Vast amounts of gene expression data desperately needing cataloging and tagging Many data analysis tools use GO graph structure to statistically evaluate clusters of co-expressed genes based on shared functional annotations

OCT 13, 2006 Cancer Genome Projects GO supports functional classifications

GO is wildly successful FIGURE 3. Representative cell-type-specific genes and corresponding molecular functions. Nature: January 2007

Comprehensively annotate Reference Genomes Saccharomyces cerevisiae Schizosaccharomyces pombe Arabidopsis thaliana Human Mouse Fly Rat Chicken Zebrafish Worm Dicty E.coli

Species coverage All major eukaryotic model organism species Human via GOA group at UniProt Several bacterial and parasite species through TIGR and GeneDB at Sanger –many more in pipeline

Annotation coverage

GO tools GO resources are freely available to anyone to use without restriction –Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes:

GO tools Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets

GO tools Many tools exist that use GO to find common biological functions from a list of genes:

GO tools Most of these tools work in a similar way: –input a gene list and a subset of ‘interesting’ genes –tool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes –tool provides a statistical measure to determine whether enrichment is significant

GO for microarray analysis Annotations give ‘function’ label to genes Ask meaningful questions of microarray data e.g. –genes involved in the same process, same/different expression patterns?

Using GO in practice statistical measure –how likely your differentially regulated genes fall into that category by chance microarray 1000 genes experiment100 genes differentially regualted mitosis – 80/100 apoptosis – 40/100 p. ctrl. cell prol. – 30/100 glucose transp. – 20/100

Using GO in practice However, when you look at the distribution of all genes on the microarray: ProcessGenes on array # genes expected in occurred 100 random genes mitosis 800/ apoptosis 400/ p. ctrl. cell prol. 100/ glucose transp. 50/

Enrichment tools GO is developing its own enrichment tool as part of the GO browser AmiGO Currently in testing phase, should be released next month