Everything must be logical Terms should be in the right place with good definition and the correct relationships I need very specific terms to describe.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Y. Jaques Yves Jaques ICIS Requirements Gathering, June 2008, Rome NeOn Lifecycle Support for Networked Ontologies.
Organisation Of Data (1) Database Theory
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene Ontology John Pinney
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Introduction to Protégé AmphibiaTree 2006 Workshop Sunday 8:45–9:15 J. Leopold & A. Maglia.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Geographic Information Systems
Cell Protein Production
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
Editing Description Logic Ontologies with the Protege OWL Plugin.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
NGS Analysis Using Galaxy
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Parcel Data Models for the Geodatabase
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
Dali JPA Tools. About Dali Dali JPA Tools is an Eclipse Web Tools Platform sub-Project Dali 1.0 is a part of WTP 2.0 Europa coordinated release Goal -
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
1 Data models Vector data model Raster data model.
Dimitrios Skoutas Alkis Simitsis
Verification and Validation in the Context of Domain-Specific Modelling Janne Merilinna.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
An Aspect of the NSF CDI InitiativeNSF CDI: Cyber-Enabled Discovery and Innovation.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
Sackler Medical School
Copyright OpenHelix. No use or reproduction without express written consent1.
Dale E. Gary Professor, Physics, Center for Solar-Terrestrial Research New Jersey Institute of Technology 1 9/25/2012Prototype Review Meeting.
Anatomy Ontology Community Melissa Haendel. The OBO Foundry More than just a website, it’s a community of ontology developers.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Topic: What is a GIS?. Spatial Data: Data with a “spatial component” describing where something is located in on the earth. Formal Definition of GIS:
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Editing Pathway/Genome Databases
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Physical Structure of GDB
RNA.
Geographic Information Systems
Human Cells Gene Expression
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Gene architecture and sequence annotation
EMP 580 Topic: What is a GIS?.
DNA and the Genome Key Area 3b Transcription.
ece 627 intelligent web: ontology and beyond
Cell Protein Production
RNA & Protein Synthesis
12-3 RNA & Protein Synthesis
Presentation transcript:

Everything must be logical Terms should be in the right place with good definition and the correct relationships I need very specific terms to describe my sequences. I need these terms yesterday. DATA SHARING Annotations come in many varieties – GenBank, SwissProt, CHADO, GFF3 to name a few MOD’s use different storage mechanisms – even if they use the same vocabulary MOD’s may use different *legitimate* ways to describe the same thing. A split location CDS conveys the same information as exon/intron structure with start and stop codons. We need to share SO compliant annotations between groups. We need to convert SO compliant annotations to other formats – for example to submit to GenBank. What needs to be done Build upon existing adapters Develop mapping files between SO and other formats CROSS PRODUCT TERMS Cross products are terms created by joining together 2 (or more) concepts from different ontologies or different aspects of the same ontology. In SO we have located_sequence_feature terms which can be placed within coordinates on a sequence, and sequence_attributes; terms which cannot exist on their own but describe a property of a feature. An example of a cross product of two of these terms is: [Term] id: SO: name: engineered_gene intersection_of: SO: ! gene intersection_of: has_quality SO: ! engineered How does this help Terms can be generated quickly. OBOEdit can help. The readable definitions are easy to produce. The terms are computable; defined by rules. What needs to be done Many of the terms in the Sequence attributes were created before cross products became mature and so are ‘place holders’ at the moment. These terms must be evaluated, the quality extracted and the cross products made. Example mRNA_with_frameshift should not be a sequence attribute but a cross product of mRNA with the quality frameshifted. Parsers and reasoners need to be aware of cross products if they are not already. DISPLAY Users of the ontology have many reasons to view and browse the ontology. To look for the correct term to annotate with. To understand the composition of a concept. To find the detail of the terminology. To find the right level for their query. Search and display functionality already exists with the Gene Ontology Consortium Editor OBO-Edit. This tool however provides more functionality than the casual user, or experienced user with a quick query is looking for. We have developed miSO, as a prototype browsing tool for SO (and any other modest sized OBO ontology). It is web based, and uses Javascript to provide both a graphical tree structure view and an autocompleing term finder. Each term is shown with definitions, links to cross references and the parent/child relationships. Advantages to this approach. Easy to use webpage Quick browsing Up to date with latest release (Also displays metadata fields) Future Work It should display different kinds of relationships in the tree. It should also search synonyms. It will need to understand cross product terms and display them correctly. VALIDATION SERVICES If MODS are going to make and distribute SO compliant annotations, we must provide tools and services to aid this process. Does the GFF3 contain the controlled vocabulary? (We have found some examples of MOD GFF3 that does not adhere to the SO controlled vocabulary - for example the word ‘coding’ is used instead of CDS) Do the assertions made in the annotation match the knowledge in the ontology? We are proposing a web based validation service for annotation formats such as GFF3, where the developer can paste in annotation and view the validation process to ensure that the semantic and syntactic information is correct. The Sequence Ontology has reached a point in development where we now have an annotating community, and a software community dispersed within the model organism groups, and now prospective users are also appearing. This is has several implications and SO is being pulled in different directions. There is a demand for new and more specific terms to be generated quickly from the annotation community, the software community need to make use of the knowledge contained in the ontology, and the prospective users want to be able to look and learn. Here we outline approaches to meet these needs I am a naïve user I want to learn more about SO I want to look at SO I want to query annotations using SO I am a skilled user I want to make my own SO compliant annotations I want to query the annotations using SO Software must understand the meaning of relationships Software must be able to traverse sequence data labeled with ontology terms. DEFINITION OVERHAUL The definition should reflect what the sequence is, not what the corresponding molecule is. There are several problems with some of the textual definitions in SO, especially where we have used an existing definition from another source. For example Definitions should not use terms like RNA or DNA because sequence can undergo transformations. We can do things like locate the catalytic sequence of a protein back to its nucleotide sequence so to pin a region down to one kind of molecule does not make a lot of sense. Also we need to make sure that our definitions do not contradict the relationships in the ontology. Currently the definition of exon restricts it to being a part of mRNA only, but in the ontology it is part of a transcript so all kinds of transcript may have exons. (Also non coding sequence may have exons) The definition overhaul will involved many of the terms we have been taking for granted such as exon and transcript. UNDERSTANDING RELATIONSHIPS Many of the relationships in SO are transitive. We can follow them and make inferences about things labeled with the terms. But what do the relationships actually mean with regards to the sequence? We have started to subtype the part_of relationship into two of the six subtypes classified in Winston et al. (1987). A member_of relationship is not necessarily spationally connected to a cohesive whole. For example regulatory regions may be found many kilobases away from the gene, or even on another chromosome. A composite_part_of relationship however denotes the part must be enclosed within the whole. This can be used computationally to validate the annotation. We are also using some spatial relationships (see Egenhofer) to describe the topology between features. Adjacent_to (meets) means that the two feature share a boundary (junction) in common. What needs to be done The relationships used in SO must be fully documents, and the ontology updated to reflect exactly what we mean by our spatial and containment relationships to allow reasoning to be more effective. Winston, M, Chaffin, R, Herrmann: A taxonomy of part-whole relations. Cog Sci 1987, 11: Egenhofer MJ: A formal definition of binary topological relationships. Lecture Notes Comp Sci 1989, 367: