GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
Relations in GO for Intro We have many relations ready to GO live in the scratch directory – within GO ontologies – across GO ontologies – between.
+ OWL for annotators David Osumi-Sutherland. + What is OWL? Web Ontology Language Can express everything in OBO and more. Certified web standard Fast.
BAVWEB 2012 Complete Manual Prerequisite: BAV theory, experience with any of the older tools.
Templates and Styles Excel Advanced. Templates are pre- designed and formatted spreadsheets –They provide consistency of layout/structure –They.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Weaving and untangling the GO is_a completeness ~9 slides granularity & BP ~3 slides Linking MF to BP ~15 slides Sensu ~13 slides –linguistic qualifiers.
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Iowa State University Animal Science Department Bioinformatics & Computational Biology Program - 01/16/06 1 Overview of Animal Trait Ontology and PATO.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
Editing Description Logic Ontologies with the Protege OWL Plugin.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.
Automated Manufacturing Systems
PATO An ontology for phenotypes. The development of PATO is the work of George Gkoutos, supported by the NCBO, working in Cambridge.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
DBMS By Narinder Singh Computer Sc. Deptt. Topics What is DBMS What is DBMS File System Approach: its limitations File System Approach: its limitations.
The SADI plug-in to the IO Informatics’ Knowledge Explorer...a quick explanation of how we “boot-strap” semantics...
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
The Plant Ontology: Linking Phenotypes and Genomics Across Plant Taxa Laurel D. Cooper* 1, Ramona L. Walls 2, Justin Elser 1, Justin Preece 1, Dennis W.
1 California State University, Fullerton Chapter 8 Personal Productivity and Problem Solving.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
OBOL Open Bio-Ontology Language GO Meeting Stanford Jan 2004.
Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Principles and Practice of Ontology Development: Making Definitions Computable Chris Mungall LBL.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Gene Ontology Consortium
Pantelis Topalis and Emmanuel Dialynas.  Ontology content  Data annotation with ontologies  Tools to handle and visualize ontologies OWL – OBO parsers.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
TermGenie – Granting Biocurators’ Wishes for the GeneOntology BioCurator Meeting 2013 Heiko Dietze – Lightning Talk.
Gene Onotology Part 1: what is the GO? Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse Genome Informatics.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Expanding species-specific anatomy ontologies to include the cell ontology Melissa Haendel (1), Ceri Van Slyke (1), Chris Mungall (2), Peiran Song (1),
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
JSON exchange format. Current GO annotation download options Tab-separated – GAF – GPAD/GPI (not available yet) XML – Pseudo RDF/XML (circa 2001) Relational.
Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Ontology domain & modeling extensions. Modeling enhancements: overview Enhancements: – Increased expressivity in ontology – Increased expressivity in.
Gene Ontology Consortium The Pathogen Group Schizosaccharomyces pombe Genome Sequencing Project DictyBase.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
The Gene Ontology Project
Week 12 Option 3: Database Design
Many GO terms are implicitly composite
Part of the Multilingual Web-LT Program
CCO: concept & current status
The Gene Ontology: an evolution
GO/PO interconnections
Presentation transcript:

GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation epidermal cell differentiation regulation of flower development interleukin-18 receptor complex B-cell differentiation dorsal ectoderm

biosynthesis is_a metabolism

cysteine is_a serine family amino acid is_a amino acid is_a amine

cysteine is_a serine family amino acid is_a amino acid is_a serine

Composed terms currently cause problems –No link to external ontology term –Redundancy –Inconsistency –Extra work –Annotation bottleneck –Tangled DAGs and confusing displays we have no way to disentangle Solution so far: –fix errors based on results of term name parsing (Obol) reactive, not proactive

Solution: actively manage composed terms Explicit pre-coordination –Composed terms should now/soon be coordinated using oboedit plugin building block terms are recorded in ontology along with composite term Benefits: –Correct DAG structure can be inferred from external ontologies e.g. make sure GO + CHEBI “align” –placement & consistency checking automated –additional work can be automated synonyms, text definitions

How will terms be pre- coordinated by oboedit? How do we record a definition for a composite term? –using a logical definition (computational essence) A logical definition consists of: –a generic term (aka genus) –relationships to other terms which serve to discriminate this specific term from other is_a children of the generic term (aka differentiae) Can be written in natural language as: –A which

Example of pre-coordination cysteine biosynthesis generic term: –biosynthesis discriminating characteristics: –outputs cysteine –natural language (Aristotelian style): a biosynthesis process which outputs cysteine

Example in Obo format [Term] id: GO: name: cysteine biosynthesis intersection_of: GO: ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine is_a: GO: ! serine family amino acid biosynthesis is_a: GO: ! cysteine metabolism

Alternate syntax used in pheno-syntax more compact similar to OWL abstract syntax I use Obo1.2 format or natural language in the rest of this presentation GO:cysteine_biosynthesis == GO:biosynthesis ∏ outputs(CHEBI:cysteine)

This allows us to dynamically untangle Process axis view (primary is_as, via generic term): –biological_process metabolism –biosynthesis »cysteine biosynthesis Process participant axis view: –amine amino acid –serine family amino acid »cysteine Combined view –(same as current tangled diamond lattice)

Obol demo

Recording the relationship is important Why not just a simple cross-product? –e.g. biosynthesis x cysteine Relationships are important for reasoning and querying –Consider: cysteine biosynthesis from serine mRNA export from nucleus during heat stress Without the relations, the logical definition is not specific enough –the essence is not captured Relations should come from RO –more required

Multiple discriminating characteristics are allowed Cysteine biosynthesis from serine –Generic term: biosynthesis –Discriminating characteristics: output cysteine input serine [Term] name: cysteine biosynthesis from serine intersection_of: GO: ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine intersection_of: input CHEBI:17822 ! serine

Composite terms can be nested [Term] id: GO:xxxxxxx name: regulation of cysteine biosynthesis intersection_of: GO: ! regulation of biological process intersection_of: regulates GO: ! cysteine biosynthesis [Term] id: GO: name: cysteine biosynthesis intersection_of: GO: ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine regulation^regulates(biosynthesis^outputs(cysteine)) regulation^regulates(biosynthesis)^outputs(cysteine) YES NO

Composite terms can optionally be manufactured in bulk Generic term: {metabolism,biosynthesis} Differentia: has_output {serine, cysteine, …} With caution… –Sparse vs dense matrices –not all combinations are types

On the importance of necessary and sufficient conditions Why intersection_of? Why not just make normal links in the GO DAG? –normal relationships are for necessary conditions only –we want both necessary and sufficient conditions captures the essence of the term

Normal DAG links only capture necessary conditions, not essence immune cell activation inflammatory response part_of A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: macrophage activation is_a

Indistinguishable by DAG immune cell activation inflammatory response part_of A change in morphology and behavior of a monocyte resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor text def: monocyte activation is_a

essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage

essence captured by genus- differentia macrophage activation immune cell activation is_a inflammatory response part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage CL:macrophage cell activation is_a genus activates

Current status of pre- coordinated terms SO already contains composite terms –46 pre-coordinated terms –A silenced gene is a gene which has the quality of being silenced GO-BP/CL integration underway –retrospectively pre-coordinated terms Obol page has pre-coordinated terms from automatic parsing –

Pre- vs post- coordinated Pre-coordination –terms are in ontology with IDs and computable definitions –increases complexity of ontology –complexity can be managed by tools e.g. new oboedit features Post-coordination –terms are combined in the database –forces more complexity in database schema and database applications

Pre-coordination is useful in moderation Commonly used terms should be pre- coordinated eg cysteine biosynthesis; oocyte differentiation; pectoral fin Avoid taking to extremes cf ICD-9 Where do we draw the line? –ontologies should be built around one or a few axes of classification term ‘explosion’ typically gets large when multiple axes are combined –we can change our minds later pre- and post- coordination is commensurable

Commensurability Annotator annotates to –nucleus^part_of(astrocyte) Anatomy editor creates new term –uses oboedit cross-product plugin –astrocyte_nucleus = nucleus^part_of(astrocyte) Annotation can be dynamically ‘promoted’ to new term in answer to queries –various software techniques for achieving this

Post-coordination in GO annotations Pre- and post- coordination are compatible and commensurable We should extend the annotation format to allow denoting more specific classes –e.g. cholesterol transport in liver –advanced applications can query this –standard applications suffer no loss –extended annotations can be used to help seed new terms in the ontology This is already being done (MGI,Dicty) –we just want to capture this in interopeable way

Post-composition in gene association files New column in GA file format Gene Product Term ID…Properties AABC1GO: (cholesterol transport) located_in(MA:liver) AABC2GO: (neuron fate development) has_participant(FBbt:Y_neuron)

Database issues Chado and GO DB can handle pre- and post- coordination –in theory anyway not yet fully tested How does it work? –‘anonymous term’ created for coordinated term –documentation in chado cvs chado/modules/cv/doc/