Ontologies, data standards and controlled vocabularies.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Pathways analysis Iowa State Workshop 11 June 2009.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
COG and GO tutorial.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
BI class 2010 Gene Ontology Overview and Perspective.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein and Function Databases
CACAO - Penn State Gene Function and Gene Ontology January 2011
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Using The Gene Ontology: Gene Product Annotation.
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
Introduction to the Gene Ontology and GO annotation resources
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Gene expression analysis
EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database.
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Emily Dimmer GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK Gene Ontology (GO)
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Part II GO-Vocabulary of Genome. S. cerevisiae D. melanogaster.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Central dogma: the story of life RNA DNA Protein.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
What’s new in GO?. Priorities Annotation outreach Reference genomes User advocacy Ontology development Software.
Annotating with GO: an overview
Introduction to the Gene Ontology
Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Ontologies, data standards and controlled vocabularies

Why use standards and CVs? Very important in High-throughput biology to sort through the vast amounts of data To use the same data labels universally To enable quick retrieval of data To enable easy comparison of data To remove ambiguities

What’s in a name? What is a cell?

What’s in a name? What is a cell? OR

What’s in a name? What is a cell? OR

What’s in a name? What is a cell?

Ambiguities in naming The same name can be used to describe different concepts, e.g: –Glucose synthesis –Glucose biosynthesis –Glucose formation –Glucose anabolism –Gluconeogenesis All refer to the process of making glucose Makes it difficult to compare the information Solution: use Ontologies and Data Standards

Ontologies An ontology is a formal specification of terms and relationships between them – widely used in biology and boinformatics (e.g. taxonomy) The relationships are important and represented as graphs Ontology terms should have definitions Ontologies are machine-readable They are needed for ordering and comparing large data sets

Gene Ontology (GO) Many annotation systems are organism-specific or different levels of granularity GO introduced standard vocabulary first used for mouse, fly and yeast, but now generic Three ontologies: molecular function, biological process and cellular component

GO Ontologies Molecular function: tasks performed by gene product –e.g. G-protein coupled receptor Biological process: broad biological goals accomplished by one or more gene products –e.g. G- protein signaling pathway Cellular component: part(s) of a cell of which a gene product is a component; includes extracellular environment of cells –e.g nucleus, membrane etc.

GO hierarchy Relationships: “is-a” “part of”

How do gene products get GO terms? Electronic annotation: –Through mappings to other biological entities and then automatic inference to proteins Manual annotation: –Model organism databases –Gene Ontology Annotation (GOA) project Evidence codes –attached to all GO annotations to show the source

Evidence Codes IEAInferred from Electronic Annotation IDAInferred from Direct Assay IMPInferred from Mutant Phenotype IPIInferred from Protein Interaction IEPInferred from Expression Pattern IGIInferred from Genetic Interaction ISS*Inferred from Sequence or Structural Similarity IGCInferred from Genomic Context RCAReviewed Computational Analysis TASTraceable Author Statement NASNon-traceable Author Statement ICInferred from Curator Judgement NDNo Data available

Electronic annotation: GO mappings

Fatty acid biosynthesis (SwissProt keyword) EC: (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) MF_00527: Putative 3- methyladenine DNA glycosylase (HAMAP) Camon et al. BMC Bioinformatics. 2005; 6 Suppl 1:S17 GO:fatty acid biosynthesis (GO: ) GO:DNA repair (GO: ) GO:acetyl-CoA carboxylase activity (GO: ) GO:acetyl-CoA carboxylase activity (GO: )

UniProt entry

Automatic transfer of annotations to orthologs Cow Dog Rat Dog Rat Mouse Ensembl GO term projection via gene homology Anopheles Mouse Chicken Cow Drosophila COMPARA Homologies between different species calculated GO terms projected from MANUAL annotation only (IDA, IEP, IGI, IMP, IPI) One-to-one and apparent one-to-one orthologies only used.

Manual annotation: GOA Project Largest open-source contributor of annotations to GO Member of the GO Consortium since 2001 Provides annotation for more than 130,000 species GOA’s priority is to annotate the human proteome GOA is responsible for human, chicken, bovine and many other annotations for the GO Consortium Annotation is done through reading of the literature

Reference Genomes Arabidopsis thaliana Caenorhabditis elegans Danio rerio (zebrafish) Dictyostelium discoideum Drosophila melanogaster Escherichia coli Homo sapiens Saccharomyces cerevisiae Mus musculus Schizosaccharomyces pombe Gallus gallus Rattus norvegicus Comprehensive annotation of a set of disease-related proteins in human Generate a reliable set of GO annotations for the 12 selected genomes Empowers comparative methods used in first pass annotation of other proteomes.

Accessing GO data (1)

QuickGO browser Human Insulin Receptor (P06213) Accessing GO data (2)

Gene Association Files Accessing GO data (3)

Gene Association File example Accessing GO data (3)

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ Downloading GOA data

Functional annotation of proteins Uses of GO 1

Find functional information on interaction proteins (IntAct) Uses of GO 2

Microarray data analysis Proteomics data analysis Larkin JE et al, Physiol Genomics, 2004 Cunliffe HE et al, Cancer Res, 2003 GO classification Analysis of high-throughput data Uses of GOA Uses of GO 3

Other Ontologies: Open Biomedical Ontologies Central location for accessing well-structured controlled vocabularies and ontologies for use in the biological and medical sciences. Provides simple format for ontologies that can encode terms, relationships between terms and definitions of terms including those taken from external ontologies.

Scope of Open Biomedical Ontologies Anatomy Animal natural history and life history Chemical Development Ethology Evidence codes Experimental conditions Genomic and proteomic Metabolomics OBO relationship types Phenotype Taxonomic classification

Ontology Lookup Service (OLS) Single point of query for currently 47 ontologies. Ontologies are updated daily from CVS repositories, including the OBO CVS repository and the PRIDE CVS repository. A tool that offers interactive and programmatic interfaces for queries on term names, synonyms, relationships, annotations and database cross- references. Originally developed for using ontologies in PRIDE.

These relationships have consequences when querying a database annotated using the ontology. What happens when I ask for PRIDE experiments describing the proteome of brain tissue? The issue faced

Using Ontologies in PRIDE For an experiment you want to define: – Species: Newt / NCBI Taxonomy ID – Tissue / organ / cell type: BRENDA Tissue ontology, Cell Type ontology; – Sub-cellular component: Gene Ontology: GO; – Disease: Human Disease: DOID; – Genotype: GO; – Sample Processing: PSI Ontology; – Mass Spectrometry: PSI-MS Ontology; – Protein Modifications: PSI-MOD Ontology

OLS usage examples What is the accession for “mitochondrion” in GO? In MeSH? –search by term name in a specific ontology or across all I’m looking for a term to annotate my protocol step but I’m not sure what term to use. –browse an ontology I’m looking for all the experiments done on liver tissue? –get all children term of liver and query on those as well My data set was annotated with GO version 123 but that was a long time ago? –get updated term names for the identifiers you have and see if any have been made obsolete

Standards for data exchange Systems Biology Markup Language (SBML) – computer-readable format for representing models of networks Biological Pathways Exchange (BioPAX) – format for representing pathways Proteomics Standards Initiative (PSI, MIAPE) Microarray standards –MIAME and MAGE

MIAPE/MIAME principles Enough information to: –Remove ambiguity in experiment –Allow easy interpretation of results –Allow experiment to be repeated –Enable comparison across similar experiments Use controlled vocabularies

Using ontologies and standards So much data in different places –need to organize and share it Used for data retrieval and comparison – easier to query Used for data integration and exchange – standard representation Used for evaluation –need “gold standard”