Relations in GO for 2009. Intro We have many relations ready to GO live in the scratch directory – within GO ontologies – across GO ontologies – between.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Three-Step Database Design
Annotation of Gene Function …and how thats useful to you.
© 2006 IBM Corporation Features of an Enterprise-ready Triple Store Ben Szekely June, 2006.
Toward an Agent-Based and Context- Oriented Approach for Web Services Composition IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 5,
24th Feb 2006 Jane Lomax Gene Ontology tutorial Talk:Using the Gene Ontology (GO) for Expression Analysis Practical:Onto-Express analysis tool Talk: GO.
Justification-based TMSs (JTMS) JTMS utilizes 3 types of nodes, where each node is associated with an assertion: 1.Premises. Their justifications (provided.
The Cell Theory is one of the foundations of modern biology : INTRODUCTION All living things are composed of one or more cells; The chemical reactions.
Introduction to Graph “theory”
Representing Part Relationships Between Developing Structures.
Task Analysis EDU 553 – Principles of Instructional Design Dr. Steve Broskoske.
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
Meiosis produces haploid gametes.
Gene Ontology John Pinney
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Gene Ontology Luis Tari. Gene Ontology (GO) URL: Gene Ontology is A hierarchy of roles of genes.
On the Application of Formal Principles to Life Science Data: A Case Study in the Gene Ontology Barry Smith * Jacob Köhler † Anand Kumar * *
Protein and Function Databases
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Machine Reading as a Process of Partial Question-Answering Peter Clark and Phil Harrison Boeing Research & Technology June 2010.
Web Application Development. Define ER model in QSEE Generate SQL Create Database mySQL Write Script to use TableEditor class Process to create A simple.
Web Application Development. Tools to create a simple web- editable database QSEE MySQL (or PHPMyAdmin) PHP TableEditor.
Gene Expression Databases: Where and When Dave Clements EuReGene and Mouse Atlas projects Medical Research Council Human Genetics.
 Asexual reproduction occurs by mitosis, it is a careful copying mechanism-meaning all offspring are always genetically identical to the parent  Sexual.
Methods of Proof involving  Symbolic Logic February 19, 2001.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Ontology based analyses methods ++ develop a grammar for making productions using mf, bp, cl: –derive a higher level grammar for next level of productions.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Thomas HeckeleiPublishing and Writing in Agricultural Economics 1 Observations on assignment 4 - Reviews General observations  Good effort! Some even.
JSON exchange format. Current GO annotation download options Tab-separated – GAF – GPAD/GPI (not available yet) XML – Pseudo RDF/XML (circa 2001) Relational.
+ Multi-organism GO annotation David Osumi-Sutherland Gene Ontology.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Meiosis produces haploid gametes. Section 1: Meiosis K What I Know W What I Want to Find Out L What I Learned.
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
TRANSCRIPTION (DNA → mRNA). Fig. 17-7a-2 Promoter Transcription unit DNA Start point RNA polymerase Initiation RNA transcript 5 5 Unwound.
1 Software Requirements Descriptions and specifications of a system.
Networks and Interactions
Merging Curated and in silico Interaction Data in Network Analysis
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Data Virtualization Demoette… Flat-File Data Sources
Cell Biology By: Langston Tunson.
Cell Division Mitosis and Meiosis
Business Process Measures
Chapter 10: Process Implementation with Executable Models
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Overview Gene Ontology Introduction Biological network data
Natural Language Processing in Molecular Biology
Gene expression analysis
The Gene Ontology: an evolution
Driver Diagrams.
The Omics Dashboard.
Presentation transcript:

Relations in GO for 2009

Intro We have many relations ready to GO live in the scratch directory – within GO ontologies – across GO ontologies – between GO and external ontologies – Both cross product (N+S conditions) and regular links Requires a fundamental change in how we and our users think about GO and annotations – Tools that make use of these will better serve users

Relations in GO In the beginning there was is_a and part_of – Benefits: simplicity We could effectively ignore relations Most tools and users effectively do this – Speculation: recent introduction of regulates had no effect on majority of users – Drawbacks: lack of expressivity We need more relations – Regulation – Spatial relations – has_part for Process-Function – annotations

Example of a relation rule in GO Rule: – A is_a B, B is_a C  A is_a C Example: We can generalize this by having a rule for transitive relations – transitive r, A r B, B r C  A r C We can also write this as a composition rule: – is_a. is_a  is_a – Open question: does this notation help or hinder??

Transitivity We currently have two transitive relations in GO: – is_a. is_a  is_a – part_of. part_of  part_of Example: – mitotic prophase part_of mitosis – In GO, part_of is an all-some relation regulates is not defined to be transitive in GO (but the majority of tools still treat it as if it were!) Example:

Composition with is_a Any relation that follows the all-some pattern composes with is_a to itself Example: – (all) nucleus part_of (some) cell Composition: – is_a. R  R – R. is_a  R Example: – (all) mitotic prophase part_of (some) mitosis – mitosis is_a cell cycle phase  – (all) mitotic prophase part_of (some) cell cycle phase

is_apart_of is_a part_of Read row first, the column (so far the table is symmetric) Composition Table

is_apart_of is_a part_of Composition Table mitotic prophase part_of mitosis is_a cell cycle phase  (all) mitotic prophase part_of (some) cell cycle phase

is_apart_of is_a part_of Chained compositions A part_f B is_a C is_a D part_of E  A part_of B is_a D part_of E  A part_of D part_of E  A part_of E order of reduction does not matter

regulates transitive_over part_of regulates. part_of  regulates inferred link inferred link

regulates transitive_over part_of regulates. part_of  regulates inferred link inferred link (all) RoSPoMCC regulates (some) MCC

regulates transitive_over part_of regulates. part_of  regulates inferred link inferred link (all) RoSPoMCC regulates (some) MCC

is_apart_ofregulates is_a part_ofregulates part_of - regulates - Composition Table: Regulates

is_apart_ofregulates is_a part_ofregulates part_of - regulates - Composition Table: Regulates regulates. part_of  regulates

is_apart_ofregulates is_a part_ofregulates part_of N/A regulates - Composition Table: Regulates part_of. regulates  N/A

is_apart_ofregulates is_a part_ofregulates part_of - regulates indirectly regulates We have the option of defining additional relations These may be entirely implicit (i.e. we would never assert indirectly regulates in GO) regulates. regulates  indirectly regulates

is_apart_ofregulatesindirectly regulates is_a part_ofregulatesindirectly regulates part_of -- regulates indirectly regulates Regulates is not transitive Indirectly regulates is transitive

is_apart_ofregulatesindirectly regulates is_aIPR~R part_ofPP-- regulatesRR~R indirectly regulates ~R USE SYMBOLS? OR IS THIS GETTING TOO ABSTRACT?

Sub-relations regulates – negatively_regulates – positively_regulates

is_apart_ofregulates+ regulates - regulates is_aIPR+R-R part_ofPP regulatesRR + regulates +R - regulates -R

is_apart_ofregulates+ regulates - regulates indirectly regulates is_aIPR+R-R~R part_ofPP---- regulatesRR~R + regulates +R ~+R~-R~R - regulates -R ~-R~+R~R indirectly regulates ~R

Sub-relations + indirect R R R+ R- ~R ~R+ ~R- normal regulates relations asserted in GO indirect regulates relations never asserted, only implied

Regulation relation lattice RDRD RDRD RD+RD+ RD+RD+ RD-RD- RD-RD- ~R ~R+ ~R- renamed to DIRECTLY regulates?indirect regulates relations never asserted, only implied ~R G ~R G + ~R G - super-relation of indirect and direct regulation (call this one “regulates”?)

has_part NOT the inverse of part_of at the ontology level Example: – nucleus part_of cell: YES every nucleus is part_of some cell – by definition; e.g. extruded nuclei are ex-nuclei – cell has_part nucleus: NO not every cell has_part nucleus – mammalian erythrocytes, bacteria Example: –

Annotations and relations not just an ontology issue – this is of relevance to annotations too… The current simple methodology of propagating annotations up the graph only works for a small subset of relations – To understand how annotations and new relations interact we must think in terms of gene product relations

Gene product relations What is the relation between a gene product and – A molecular function? – A biological process? – A cellular component? Why care? What’s wrong with “annotated_to”? – We need to define these relations: to do justice to the biology to be able to deal with new relations within the GO itself

Why we should care How should annotation queries, analysis tools (slimmers, enrichment tools) etc treat the (pseudo-)new regulates relation? How should we recommend the process- function links be vizualized? How should these links be treated in queries?

Proposed relations for gene products For MF and BP: – has_potential – has_function_during For CC: – localized_to – This is more specific than has_location A gene product may travel through different locations – Formally: GP localized_to CC : GP executes some function in CC Names TBD MFs are ontologically like BPs (bfo processes)….

How to read a GAF gene product may not be explicitly in GAF – that’s OK – gene as proxy The relation does NOT apply to the gene however genes are only localized_to chromosomes, and only participate in gene expression. It’s the products that do the work is implicit, depending on F, C or P Examples:

Annotation relation composition is_a – always propagate over is_a localized_to. is_a  localized_to has_function_in. is_a  has_function_in part_of localized_to. part_of  localized_to has_function_in. part_of  has_function_in This is effectively what we do with gene product annotations now post-hoc logical justification for why it’s OK to propagate

Annotation relation composition: regulates regulates – localized_to. regulates  NEVER POSSIBLE localized_to never has a process as target regulates always has process as subject – has_function_in. regulates  regulator_of This introduces an addition implicit relation that can be used to sum gene product results – Fake AmiGO screenshot here

Annotation relation composition: inter- ontology links We have 183 CC->MF/BP links in scratch regulates – localized_to. has_function_in  ??may_contribute_to?? Example: RPS25A localized_to ribosome ribosome has_function_in protein biosynthesis –  RPS25A ??has_function_in?? protein biosynthesis No need for curator to make explicit IC claims Q: we never want “may” in relation names? Can we make a stronger claim? How does a curator know when to make an IC claim here? Potential confusion with contributes_to qualifier

Annotation relations and has_part Need some graphical illustrations See – – for now

Qualifiers Annotation qualifiers (contributes_to) have the effect of modifying the relation – NOT is not a qualifier – it is a logical operator We can add new relations to the qualifier column – geneProductA acted_on_during protein secretion by the type II secretion system

Secondary taxon IDs

Cell component relations We have 674 xp defs within CC in scratch – adjacent_to – surrounds/surrounded_by – spans – overlaps Use case: reactome Can we say anything about gene products here? – we can perform spatial gene product queries

Spatial reasoning – spans. adjacent_to  overlaps (??TBD!!) – SUN-KASH complex spans nuclear inner membrance – nuclear inner membrane adjacent_to nuclear lumen –  – SUN-KASH complex overlaps nuclear lumen

Links from BP to external ontologies Process-continuant links – A has_function_in cysteine biosynthesis  A ??has_participant?? cysteine this is true but can we make stronger claims – A has_function_in heart development  A has_participant heart c.f. heart process, TAZ gene How can we use this? – Browse GO annotations via other ontologies – Enrichment using anatomy terms… – AmiGO screenshots

what next?

Won’t this confuse users? We will provide a pre-made inferred relation table for all of GO – we could do this for gps too but it would be over a billion entries.. We can always distribute a dumbGO – just is_a and part_of, not even regulates Need more guidance on how this can be used

Discussion

What’s next? Move relations into GO editors file – post OE2 – CC-self spatial relations – BP->MF has_part regulates – BP->BP has_part (??) – External onts Dual releases? dumbGO and fullGO? Fix GOC tools (AmiGO, slimmer, enrichment, graphviz, refG) to deal appropriately – OE2 should already be fine Educate non-GOC folks